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Abstract 

The  Markov  chain  approximation  method  is  a  widely  used,  and  effi¬ 
cient  family  of  methods  for  the  numerical  solution  a  large  part  of  stochas¬ 
tic  control  problems  in  continuous  time  for  reflected-jump-diffusion-type 
models.  It  converges  under  broad  conditions,  and  there  are  good  algo¬ 
rithms  for  solving  the  numerical  approximations  if  the  dimension  is  not 
too  high.  It  has  been  extended  to  zero-sum  stochastic  differential  games. 

We  apply  the  method  to  consider  a  class  of  non-zero  stochastic  differen¬ 
tial  games  with  a  diffusion  system  model  where  the  controls  for  the  two 
players  are  separated  in  the  dynamics  and  cost  function.  There  have  been 
successful  applications  of  the  algorithms,  but  convergence  proofs  have 
been  lacking.  It  is  shown  that  equilibrium  values  for  the  approximat¬ 
ing  chain  converge  to  equilibrium  values  for  the  original  process  and  that 
any  equilibrium  value  for  the  original  process  can  be  approximated  by 
an  e-equilibrium  for  the  chain  for  arbitrarily  small  e  >  0.  The  numerical 
method  solves  a  stochastic  game  for  a  finite-state  Markov  chain. 
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1  Introduction 

The  aim  of  this  paper  is  to  extend  the  Markov  chain  approximation  method 
to  numerically  solve  non-zero-sum  stochastic  differential  games.  The  method 
is  widely  used,  robust,  and  relatively  easy  to  use.  It  covers  the  majority  of 
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stochastic  control  problems  in  continuous  time,  for  controlled  reflected-jump- 
diffusion  type  models  that  have  been  of  interest  to  date,  and  converges  under 
broad  conditions.  For  the  control  problem  there  are  good  algorithms  for  solv¬ 
ing  the  numerical  problems,  if  the  dimension  is  not  too  high  [15].  The  method 
was  extended  to  zero-sum  stochastic  differential  games  in  [12,  13,  14],  with  the 
last  two  references  concerned  with  the  ergodic  cost  case,  extending  partial  prior 
results  such  as  [1,  17,  18].  There  has  been  successful  numerical  work  on  non¬ 
zero-sum  differential  games  [8,  9],  based  on  the  Markov  chain  approximation 
method,  but  there  does  not  seem  to  be  any  available  theory  concerning  conver¬ 
gence.  Works  such  as  [3]  are  concerned  with  approximations  to  non-zero-sum 
games  in  normal  form  and  do  not  apply  to  the  system  models  or  the  type  of 
approximations  that  appear  in  our  numerical  approximations. 

We  will  consider  a  discounted  cost  problem  for  a  diffusion  model  in  a  hy¬ 
perrectangle  G,  with  absorption  on  the  boundary.  The  state  space  G  and  the 
boundary  absorption  are  selected  only  to  simplify  the  development  so  that  we 
can  concentrate  on  the  issues  that  are  unique  to  the  non-zero-sum  case.  One 
can  replace  the  hyperrectangle  and  boundary  absorption  by  an  arbitrary  con¬ 
vex  polyhedron  with  boundary  reflection,  if  the  reflection  directions  satisfy  the 
conditions  in  [15]  or  in  [12].  The  hyperrect angular  state  space  is  often  used  for 
purely  numerical  reasons,  to  assure  a  bounded  state  space,  and  then  it  would 
be  large  enough  so  that  it  would  not  interfere  with  the  values  for  the  initial 
conditions  of  main  interest.  We  will  work  with  two-player  games.  Any  number 
of  players  can  be  dealt  with  but  we  stick  to  two  for  notational  simplicity.  The 
non-zero-sum  game  is  difficult  because,  as  opposed  to  the  zero-sum  case,  the 
players  are  not  strictly  competitive  and  have  their  own  value  functions. 

The  idea  of  the  Markov  chain  approximation  method  is  to  first  approximate 
the  controlled  diffusion  dynamics  by  a  suitable  Markov  chain  on  a  finite  state 
space  with  a  discretization  parameter  h,  then  approximate  the  cost  functions. 
One  solves  the  game  problem  for  the  simpler  chain  model,  and  then  proves 
that  the  value  functions  associated  with  equilibrium  or  e-equilibrium  strate¬ 
gies  for  the  chain  converge  to  the  value  functions  associated  with  equilibrium 
or  ei-equilibrium  strategies  for  the  diffusion  model,  where  ei  ^  0  as  e  ^  0. 
The  methods  of  proof  are  purely  probabilistic,  no  PDF  techniques  are  required, 
so  no  knowledge  of  whatever  PDF’s  yield  the  equilibrium  values  are  needed.^ 
Such  methods  have  the  advantage  of  providing  intuition  concerning  numeri¬ 
cal  approximations,  they  cover  the  bulk  of  the  problem  formulations  to  date, 
and  they  converge  under  quite  general  conditions.  The  essential  condition  is 
a  natural  “local  consistency”  condition.  Getting  approximations  satisfying  this 
condition  is  usually  straightforward.  Many  methods  are  discussed  in  [15]  and  all 
of  them  are  applicable  to  the  game  problem  of  interest  here.  Furthermore,  the 
numerical  approximations  are  represented  as  processes  which  are  close  to  the 
original,  which  gives  the  method  intuitive  meaning.  We  are  not  concerned  with 
algorithms  for  numerically  solving  the  game  for  the  chain  model,  only  showing 

^At  present  there  seems  to  be  no  information  available  concerning  the  PDE’s  that  yield 
the  values. 
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convergence  of  the  solutions  to  the  desired  values  as  the  discretization  parameter 
goes  to  zero. 

In  Section  2,  the  model  and  the  cost  functions  for  the  players  are  defined, 
the  boundary  conditions  discussed  and  various  background  material  is  given.  A 
“uniform  in  the  controls”  discrete-time  approximation  that  will  be  used  in  the 
sequel  is  also  given.  The  convergence  proof  depends  heavily  on  the  fact  that  the 
original  diffusion  process  can  be  approximated  (uniformly  in  the  controls),  with 
various  approximations  to  the  controls,  and  the  needed  results  are  developed 
in  Section  3.  A  particular  representation  of  an  e-equilibrium  strategy,  in  terms 
of  a  “smooth”  conditional  probability,  depending  only  on  selected  samples  of 
the  driving  Wiener  process  (and  not  on  the  entire  Wiener  process),  is  given 
in  Section  4.  Various  facts  concerning  the  Markov  chain  approximation  are 
collected  in  Section  5.  The  reader  is  referred  to  [15]  for  a  fuller  treatment.  The 
chain  is  represented  in  terms  of  a  driving  martingale,  and  this  representation  is 
used  to  get  analogs  of  the  results  in  Section  3  that  use  approximations  to  the 
chain  to  show  that  the  probability  law  of  the  chain  and  the  costs  change  little 
if  the  control  process  is  approximated  in  various  ways.  These  results  are  new 
and  should  be  more  broadly  useful  in  dealing  with  numerical  approximations. 
Theorem  6.1  in  Section  6  shows  that  an  ‘approximate”  equilibrium  (value  or 
strategy)  for  the  diffusion  is  an  “approximate”  equilibrium  (value  or  strategy) 
for  the  chain  for  small  discretization  parameter  h.  If  the  e-equilibrium  value  for 
the  chain  is  unique  for  small  e  >  0,  then  the  convergence  proof  is  complete  since 
an  “approximate”  equilibrium  value  for  the  chain  is  also  one  for  the  diffusion. 
If  the  value  is  not  unique  then  the  proof  of  this  last  fact  is  more  difficult,  and 
we  restrict  attention  to  the  case  where  the  diffusion  coefficient  does  not  depend 
on  the  state.  This  is  done  in  Theorem  6.2,  which  is  a  consequence  of  Theorem 
5.6,  which,  in  turn,  applies  a  strong  approximation  theorem  to  show  that  the 
discrete  time  approximation  to  the  diffusion  and  that  for  the  interpolated  chain 
are  very  close,  uniformly  in  the  controls. 


2  The  Model 

We  consider  systems  of  the  form,  where  x{t)  €  Euclidean  v-space, 

x{t)  =  a:(0)  +  /  ^  hi{x{s),Ui{s))ds  +  /  a{x{s))dw{s),  (2.1) 

Jo  Jo 

where  player  i  =  1,2,  has  control  Ui(-)  and  cost  function 

W,{u)  =  E^  re-^‘Vfc.(x(s),u.(s))ds  +  A](Ve-%.(x(r)).  (2.2) 

J^  i  i 

Condition  (A2.1)  below  holds,  /3  >  0,  t  is  the  first  time  that  the  boundary  of  G 
is  hit  (it  equals  infinity  if  the  boundary  is  never  reached),  and  w{-)  is  a  standard 
vector-valued  Wiener  process.  The  i?“  denotes  the  expectation  given  the  use  of 
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control  u(-)  =  (ui(-),U2(-))  and  initial  condition  x.  Define  b(-)  =  bi(-)  +  b2(-), 
k(-)  =  fci(-)  +  k2{-). 

A2.1.  The  functions  bi{-),  and  a{-)  are  hounded  and  continuous  and  Lipschitz 
continuous  in  x,  uniformly  in  u.  The  controls  Ui(-)  for  player  i  take  values  in 
Ui,  a  compact  set  in  some  Euclidean  space,  and  the  functions  ki{-)  and  gi{-)  are 
hounded  and  continuous. 

A  control  Ui{-)  is  said  to  be  in  Ui,  the  set  of  admissible  controls  for  player  i,  if 
it  is  measurable,  non-anticipative  with  respect  to  w{-),  and  C/i-valued.  Later  we 
will  introduce  strategies  and  admissible  relaxed  controls.  The  methods  of  proof 
use  a  weak  convergence  analysis  as  in  [15],  and  to  the  extent  possible  we  use 
the  results  of  that  reference.  For  S  a  topological  space,  let  D[S;0,oo)  denote 
the  S'-valued  functions  on  [0,oo)  that  are  right  continuous  and  have  left  hand 
limits,  and  with  the  Skorohod  topology  [5,  15]  used.  If  S'  =  M" ,  then  we  write 
i:>[S;0,oo)  =  i:)"[0,oo). 

The  first  hitting  time  r.  Getting  numerical  solutions  requires  working  in  a 
bounded  state  space.  Often  the  physics  of  the  problem  provide  both  a  bounded 
state  space  and  the  proper  boundary  conditions.  Otherwise,  “numerical”  bound¬ 
aries  are  added.  In  any  case,  one  needs  to  provide  the  necessary  boundary 
conditions.  These  will  be  equivalent  to  either  reflection  or  absorption  at  the 
boundary.  Both  are  covered  in  [15].  Here,  we  chose  boundary  absorption,  but 
the  details  that  are  unique  to  the  non-zero-sum  game  problem  would  be  the 
same  in  both  cases. 

The  nature  of  the  hitting  time  t  of  the  boundary  of  the  set  G  poses  a 
particular  concern  from  the  point  of  view  of  the  convergence  of  the  numerical 
algorithm.  The  proof  of  convergence  generates  a  sequence  of  process  approx¬ 
imations  (continuous-time  interpolations  of  the  approximating  chain)  and  the 
exit  or  boundary  hitting  times  of  this  sequence  has  to  converge,  in  an  appro¬ 
priate  probabilistic  sense,  to  the  exit  time  of  (2.1).  In  fact,  no  matter  what  the 
numerical  procedure,  something  analogous  must  take  place.  In  order  to  see  the 
problem,  refer  to  Figure  1. 

In  the  figure,  the  sequence  of  functions  <('«(•)  converges  to  the  limit  function 
(foi'),  but  the  sequence  of  first  contact  times  (r„)  of  ^n(-)  converges  to  a  time 
To  which  is  not  the  moment  t  of  first  contact  of  4’o{-)  with  the  boundary  line 
dG  of  G.  The  problem  in  this  case  is  that  the  limit  function  (j)o{-)  is  tangent  to 
dG  at  the  time  of  first  contact. 

For  our  control  problem,  if  the  approximating  costs  are  to  converge  to  the 
costs  for  (2.1),  (2.2),  then  we  need  to  assure  (at  least  with  probability  one) 
that  the  paths  of  the  limit  x{-)  are  not  “tangent”  to  dG  at  the  moment  r  of 
first  hitting  the  boundary.  For  in  II”[0,oo)  (with  the  Skorokhod  topology 
used),  define  the  function  f(()))  with  values  in  the  compactified  infinite  interval 
IR^  =  [0,  oo]  by:  f  (<())  =  oo,  if  (j){t)  G  G°,  the  interior  of  G,  for  all  t  <  oo,  and 
otherwise  use 

t(^)  =  infjt  :  4){f)  ^  G°}. 
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Figure  1:  Continuity  of  first  exit  times. 


In  the  example  of  Figure  1,  t(-)  is  not  continuous  at  the  path  (jjoi')- 

If  the  in  the  figure  were  a  sample  path  of  a  Wiener  process  w{-),  then  the 
probability  is  zero  that  it  would  be  “tangent”  to  the  boundary  of  G  at  the  point 
of  first  contact.  Indeed,  w.p.l,  the  path  would  cross  the  line  infinitely  often  in 
any  small  time  interval  after  first  contact.  Hence,  w.p.l,  the  first  hitting  times  of 
any  approximating  sequence  would  have  to  converge  to  the  hitting  time  of  ic(-). 
The  situation  is  similar  if  the  Wiener  process  were  replaced  by  the  solution  to 
a  stochastic  differential  equation  with  a  uniformly  positive  definite  covariance 
matrix  a{x)  =  a{x)a'{x).  The  following  condition  will  be  used.  Note  that 
the  condition  can  be  assured  to  hold  if  the  randomized  stopping  approximation 
discussed  below  is  used. 

A2.2.  For  each  initial  condition  and  control,  the  function  f  (•)  is  continuous  {as 
a  map  from  £)’'[0,oo)  to  the  compactified  interval  [0,  oo])  with  prohahility  one 
relative  to  the  measure  induced  by  the  solution  to  (2.1). 

The  tangency  problem  would  be  a  concern  with  any  numerical  method, 
since  they  all  depend  on  some  sort  of  approximation.  For  example,  the  con¬ 
vergence  theorems  for  the  classical  finite  difference  methods  for  elliptic  and 
parabolic  equations  generally  use  a  nondegeneracy  condition  on  a{x)  in  order 
to  (implicitly)  guarantee  (A2.2).  In  fact,  one  can  always  add  an  independent 
u-dimensional  Wiener  process  with  small  variance  to  (2.1),  which  will  assure 
(A2.2),  while  changing  the  costs  arbitrarily  little. 

The  verification  of  (A2.2)  for  the  case  where  a{x)  is  degenerate  is  more  com¬ 
plicated,  and  one  needs  to  work  with  the  particular  structure  of  the  individual 
case.  The  boundary  can  often  be  divided  into  several  pieces,  where  we  are  able 
to  treat  each  piece  separately.  For  example,  there  might  be  a  segment  where 
a  “directional  nondegeneracy”  of  a{x)  guarantees  the  almost  sure  continuity  of 
the  exit  times  of  the  paths  which  exit  on  that  segment ,  plus  a  segment  where  the 
direction  of  the  drift  gives  a  similar  guarantee,  plus  a  segment  on  which  escape 
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is  not  possible,  and  a  “remaining”  segment.  Frequently,  the  last  “complemen¬ 
tary”  set  is  a  finite  set  of  points  or  a  curve  of  lower  dimension  than  that  of  the 
boundary.  Special  considerations  concerning  these  points  can  often  resolve  the 
issue  there.  An  important  class  of  such  a  degenerate  example  is  illustrated  in 
[10,  pp.  64-66].  In  that  two-dimensional  example,  G  is  the  symmetric  square 
box  centered  about  the  origin  and  the  system  is  (x  =  (xi,X2)) 


dxi  =  X2dt, 
dx2  =  udt  +  dw, 

and  the  control  u(-)  is  bounded.  The  above  cited  “complementary  set”  is  just  the 
two  points  which  are  the  intersections  of  the  horizontal  axis  with  the  boundary, 
and  these  points  can  be  taken  care  of  by  a  test  such  as  that  in  Theorem  6.1  of 
[16].  2 

Randomized  stopping.  An  alternative  to  (A2.2).  The  boundaries  in 
control  problems  are  often  not  fixed  precisely.  For  example,  they  might  be  in¬ 
troduced  simply  to  bound  the  state  space.  The  original  control  problem  might 
be  defined  in  an  unbounded  space,  but  the  space  is  then  truncated  for  numerical 
reasons.  Even  if  there  is  a  given  “target  set,”  it  is  often  not  necessary  to  fix  it 
too  precisely.  Such  considerations  give  us  some  freedom  to  vary  the  boundary 
slightly.  The  “randomized  stopping”  alternative  discussed  next  exploits  these 
ideas  and  assures  (A2.2).  Under  randomized  stopping,  the  probability  of  stop¬ 
ping  at  time  t  (if  the  process  has  not  yet  been  stopped)  goes  to  unity  as  x{t)  at 
that  time  approaches  dG.  This  can  be  formalized  as  follows  [15]. 

For  some  small  e  >  0,  let  A(-)  >  0  be  a  continuous  function  on  the  set 
N^{dG)  n  G°,  where  N^{dG)  is  the  e— neighborhood  of  the  boundary  and  G°  is 
the  interior  of  G.  Let  A(a;)  ^  oo  as  x  converges  to  dG.  Then  stop  x(-)  at  time  t 
with  stopping  rate  A(x(t))  and  stopping  cost  gi{x{t))  for  player  i.  Randomized 
stopping  is  equivalent  to  adding  an  additional  (and  state  dependent)  discount 
factor  which  is  active  near  the  boundary. 

Relaxed  controls  In  control  theory,  when  working  with  problems  con¬ 

cerning  convergence  of  sequences  or  approximations,  it  is  usual  to  use  the  so- 
called  relaxed  controls  in  lieu  of  ordinary  controls.  They  are  used  for  theoretical 
purposes  only,  for  the  purposes  of  getting  approximation  and  convergence  proofs. 
Suppose  that  for  some  filtration  {!Ft,  t  <  oo},  standard  vector-valued  iFj-Wiener 
process  w{-)  and  for  i  =  1,2,  ri(-)  is  a  measure  on  the  Borel  sets  of  Ui  x  [0,oo) 
such  that  ri{Ui  x  [0,t])  =  t  and  the  process  ri{A  x  [0,-])  is  measurable  and 
non-anticipative  for  each  Borel  set  A  C  Ui.  Then  ri(-)  is  said  to  be  an  admis¬ 
sible  relaxed  control  for  player  i  with  respect  to  w{-)  [6,  15].  Abusing  notation 
slightly,  we  use  Ui  for  the  set  of  admissible  relaxed  controls  as  well  for  the  set 
of  admissible  ordinary  controls  Ui{-).  If  the  Wiener  process  and  filtration  are 
obvious  or  unimportant,  we  simply  say  that  ri(-)  is  an  admissible  relaxed  control 

^See  also  [15,  p  280,  sec  ed.j  where  it  is  shown  that  the  Girsanov  transformation,  can  play 
a  useful  role  in  the  verification  of  (A2.2). 
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for  player  i.  For  Borel  sets  A  C  Ui,  we  will  write  ri{A  x  [tojfi])  =  fi{A,  [to,ti]), 
and  write  ri{A,ti)  if  to  =  0-  Define  U  =  Ui  x  U2  and  U  =Ui  x  Henceforth 
{J^t}  will  denote  a  filtration  such  that  w{-)  is  an  iFj-standard  Wiener  process 
and  r(-)  is  admissible,  for  the  r(-)  of  concern. 

For  almost  all  (w,  t)  and  each  Borel  A  C  Ui,  one  can  define  the  left  derivative^ 


rUA,  t)  =  lim 
5^0 


r,{A,t) 


r,{A,t-  6) 
S 


Without  loss  of  generality,  we  can  suppose  that  the  limit  exists  for  all  (w,t). 
Then  for  all  (w,t),  r-(-,t)  is  a  probability  measure  on  the  Borel  sets  of  Ui  and 
for  any  bounded  Borel  set  B  in  Ui  x  [0,  00), 

n{B)  =  /  /  I{{a,,t)GB}r'i{dai,t)dt. 

Jo  JUi 

An  ordinary  control  Ui{-)  can  be  represented  in  terms  of  the  relaxed  control  ri(-) 
that  is  defined  by  its  derivative,  which  takes  the  form  r'(A,  t)  =  lA{ui{t)),  where 
lA{ui)  is  unity  if  G  A  and  is  zero  otherwise.  The  weak  topology  [15]  will  be 
used  on  the  space  of  admissible  relaxed  controls.  Relaxed  controls  are  commonly 
used  in  control  theory  to  prove  existence  and  approximation  theorems,  since  any 
sequence  of  relaxed  controls  has  a  weakly  convergent  subsequence.  The  use  of 
relaxed  controls  does  not  change  the  range  of  values  of  the  cost  functions. 

Define  the  “product”  relaxed  control  r(-),  by  its  derivative  r'(-)  =  x 

r2(-,t).  Thus  r(-)  is  a  product  measure,  with  marginals  ri{-),i  =  1,2.  We  will 
usually  write  r(-)  =  (ri(-), r2(-))  without  ambiguity.  The  pair  (r/;(-),r(-))  is 
called  an  admissible  pair  if  each  of  the  ri{-)  is  admissible  with  respect  to  w{-). 
In  relaxed  control  terminology,  (2.1)  and  (2.2)  are  written  as 


x{t) 


=  a;(0)+^/  /  h{x{s),ai)r'i{dai,s)ds + 

Ju, 


a(x(s))dw(s). 


(2.3) 


Wi(x,r)  =  f  e  /  '^ki{x{s),ai)r'i{dai,  s)ds  E^e  ^”gi(x(r)). 

Jo  Ju,  i 

(2.4) 

The  drift  terms  can  be  written  as  (e.g.)  Jjj  b{x{s),  a)r'{da,  s)ds. 


A  discrete  time  system.  We  will  also  have  need  for  the  discrete  time  form 

/•nA+A  p 

x^{nA  +  A)  =  x^{nA)  +  /  /  b{x^{nA),a)r'{da,s)ds  ,  , 

JnA  Ju  AA) 

+a{x^{nA))[w{nA  +  A)  —  w{A)]. 

We  can  define  the  continuous  time  interpolation  a;^(-)  either  by  x^{t)  =  x^{nA) 


^  “Left”  is  used  because  we  need  the  derivative  to  be  non-anticipative. 
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for  t  G  [nA, nA  +  A),  or  as  (on  the  same  interval) 


(t)  =  (nA)  +  f  f  b{x‘^{nA),a)r'{da,  s)ds  +  f  a{x^{nA))dw{t), 

J nA  J U  J nA 

(2.6) 

where  it  is  assumed  that  r{t,-)  is  adapted  to  !FnA-,  for  t  G  [nA,nA  +  A). 
The  associated  cost  function  W/^(x,r)  is  (2.4)  with  a;^(-)  replacing  x(-).  Let 
r'^(-),r(-)  be  admissible  relaxed  controls  with  respect  to  w(-)  with  r^(-)  ^  r(-) 
w.p.l  (in  the  weak  topology)  and  r^(-)  adapted  as  above.  Then,  as  A  ^  0,  the 
sequence  of  solutions  {a;'^(-)}  also  converges  w.p.l,  uniformly  on  any  bounded 
time  interval  and  the  limit  (x(-),r(-),w(-))  solves  (2.3).  By  (A2.2),  the  first 
hitting  times  of  the  boundary  also  converge  w.p.l.  to  that  of  the  limit.  The 
costs  converge  as  well.  The  analogous  result  holds  if  the  randomized  stopping 
alternative  is  used. 

Randomized  vs.  relaxed  controls.  For  the  discrete  time  system  (2.5)  or 
(2.6),  the  relaxed  control  can  be  approximated  by  a  randomized  ordinary  con¬ 
trol  (which  relates  the  relaxed  control  to  randomized  strategies),  as  follows. 
Let  r(-)  be  a  relaxed  control  that  is  admissible  with  respect  to  w(-).  Let 
be  a  random  variable  with  the  (conditional  on  !Fn\)  distribution  r(^„(-)  = 
EnA[Ti{- ,[nA,nA  +  A])]  /  A,  where  E^a  denotes  the  conditional  expectation 
given  TnA-  Set  U2‘,n) ^  define  its  continuous-time  interpolation  (with 

intervals  A)  u^{-),  and  define  the  process  i^(-)  by  i^(0)  =  x^(0)  =  a;(0)  and 

x^{nA  -I-  A)  =  x^{nA)  +  Ab{x^{nA),u^)  +  a{x^{nA)[w{nA  -|-  A)  —  r(;(nA)]. 

(2-7) 

Let  x^{t)  denote  the  continuous  time  interpolation.  Then  we  have  the  following 
result,  where  the  relaxed  control  r'^(-)  that  is  used  for  x'^{-)  has  the  derivative 
r'^’'(-)  =  r^„(-)r^„(-)  on  [nA,nA  -|-  A).  The  theorem  implies  that  in  the  con¬ 
tinuous  limit,  randomized  controls  turn  into  relaxed  controls. 

Theorem  2.1.  Assume  (A2.1).  Then  for  any  T  <  oo, 

lim  sup  sup  if  sup  |a;'^(t)  —  x{t)\^  =  0,  (2.8a) 

^^Oa;(0)eG  r-ew  t<T 

lim  sup  sup  if  sup  \x^(t)  —  i'^(i)|^  =  0.  (2.8b) 

rew  t<T'  ' 

Under  the  additional  condition  (A2.2)  the  costs  for  (2.5)  and  (2.7)  converge 
{uniformly  in  a;(0),r(-))  to  those  for  (2.3)  as  well. 

Comment  on  the  proof.  Define  6x^  =  x^{nA)  —  x‘^{nA).  Then 

dx^j^i  =  6x^  E  A  f  [b{x'^{nA,  a)  —  b{x‘^{nA,  a)]  r^{da) 

Ju 

+  [a{x'^{nA))  —  (T(i^(nA))]  [w{nA  -|-  A)  —  r(;(nA)]  -|-  N:^, 

8 


where 

Nn  f  oi)r^{da)  -  u^) 

Ju 

is  an  Tn\-  martingale  difference  by  the  definition  of  u^(-)  via  the  conditional 
distribution  given  J^„a-  Also  =  O(A^).  The  proof  of  the  uniform 

(in  the  control  and  initial  condition)  convergence  to  zero  of  |a;^(-)  —  and 

of  the  differences  between  the  integrals 

E  f  e~^*k{x‘^{s),u^{s))ds,  E  f  f  e~^*k{x^{s),a))r^’'{da,s)ds 

Jo  Jo  Ju 

can  then  be  completed  by  using  the  Lipschitz  condition  and  this  martingale  and 
conditional  variance  property.  This  implies  (2.8b).  An  analogous  argument  can 
be  used  to  get  (2.8a)  for  each  r(-)  and  a;(0).  The  facts  that  (A2.2)  holds  for  (2.3) 
and  that  (2.8)  holds  imply  that  the  stopping  times  for  x^{-),x‘^{-)  converge  to 
those  for  (2.3)  as  well  for  each  x(0)  and  r(-). 

The  uniformity  in  (2.8a)  and  in  the  convergence  of  the  costs  can  be  proved 
by  an  argument  by  contradiction  that  goes  roughly  as  follows.  Suppose,  for 
example,  that  the  uniformity  in  (2.8a)  does  not  hold.  Then  take  a  sequence 
x™(0), r’"(-), Am  ^  0,  m  =  1,2...,  and  associated  solutions  x™{-)  to  (2.3). 
Let  defined  as  r^(-)  was,  but  based  on  r'"(-)  and  let 

denote  the  interpolation  of  the  associated  relaxed  control.  Define  as 

the  solution  to  (2.6)  with  interval  A^  and  controls  (alternatively,  it 

could  be  the  piecewise  constant  interpolation).  Suppose  that,  for  some  T  <  oo, 
lirnsupm^oo  k™’^"*(i)  -  i™A™(t)|2  >  q. 

Take  an  arbitrary  weakly  convergent  subsequence  of  a;'"(-),  x^A™ (.)^  ?"'"(•), 
with  (weak-sense)  limit  denoted  by  a;(-),  x(-),  r(-),  f(-),  «;(•).  Then 
it  is  easy  to  show  that  x{-)  =  x{-)  and  r(-)  =  f(-),  that  w{-)  is  a  standard 
Wiener  process,  a;(-),f(-)  are  non-anticipative  with  respect  to  w{-)  and  that 
the  set  satisfies  (2.3).  Assume,  without  loss  of  generality,  that  Skorohod  rep¬ 
resentation  is  used  [5,  15],  so  that  we  can  suppose  that  the  original  and  limit 
processes  are  all  defined  on  the  same  probability  space  and  that  convergence  is 
w.p.l  in  the  Skorohod  topology.  For  any  T  <  oo,  the  set  of  random  variables 
{|a;'"(t)p,  jg  uniformly  integrable.  Thus 

lim  if  sup 

m^oo 

and 

lim  if  sup  |a;’"(t)  —  x(t)|^  =  0, 

m^oo 

a  contradiction  to  the  assertion  that  the  uniformity  in  x(0)  and  r(-)  in  (2.8a) 
does  not  hold.  ■ 

3  Classes  of  Controls  and  Approximations 

The  convergence  proofs  will  require  the  use  of  special  approximations  to  the 
general  ordinary  or  relaxed  copntrols,  and  the  necessary  approximations  are 
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developed  in  this  section  and  in  Theorem  4.1. 

For  each  admissible  relaxed  control  r(-)  and  e  >  0,  let  r|(-)  be  admissible 
relaxed  controls  with  respect  to  the  same  filtration  and  Wiener  process  w{-), 
with  derivatives  and  that  satisfy 


lim  sup  E  sup 

*^^0  neUi  t<T 


rl{dai,  s)  —  rl’'{dai,  s)]  ds 


=  0,  z=l,2,  (3.1) 


for  each  bounded  and  continuous  real- valued  nonrandom  function  and  each 
T  <  oo.  Let  x{-)  and  denote  the  solutions  to  (2.3)  corresponding  to  r(-) 
and  r'^(-),  respectively,  with  the  same  w{-)  used,  but  perhaps  different  initial 
conditions.  In  particular,  define  by 

{da,s)ds+  f  a(x'^(s))dw(s).  (3.2) 

Jo 

The  processes  x(-)  and  a;‘^(-)  depend  on  r(-)  and  r‘^(-),  resp.,  but  this  dependence 
is  suppressed  in  the  notation.  The  next  theorem  shows  that  the  solution  x(-) 
is  continuous  in  the  controls  in  the  sense  that  (3.3)  below  holds,  and  that  the 
costs  corresponding  to  r(-)  and  r'^(-)  are  arbitrarily  close  for  small  e,  uniformly 
in  r(-). 


x^(t)  =  a;'^(0)  -I- 


fh^ 


'(s),a)T 


Theorem  3.1.  Assume  (A2.1).  Let  (r(-), r‘^(-))  satisfy  (3.1)  for  each  hounded 
and  continuous  4>i{-),i  =  1,2,  and  T  <  oo.  Define  Sx^{t)  =  x'^{f)  —  x(t).  Then 
for  each  t 

lim  sup  sup  E 

a;(0),a;'(0):|s'(0)-s(0)H0  reU 

Under  the  additional  condition  (A2.2) 

lim  sup  sup  |Wi(a;,r)  —  lFi(x,r*^)|  =  0,  t  =  l,2.  (3.4) 

a;(0),a;'(0):|a:'(0)-a:(0)H0  r^U 


sup  |fe'^(s)| 

s<.t 


=  0. 


(3.3) 


Comments  on  the  proof.  The  proof  is  very  similar  to  that  of  Theorem  2.1, 
and  we  comment  only  on  the  use  of  (3.1).  We  can  write 


fa'^(t)  =  i5a;'^(0) -I-  /  /  [b{x'^{s),a)  —  b{x{s),a)]r'{da,s)ds 
^  Jo  Ju 

+  f  [(T(a:'^(s))  -  cr(a;(s))]  (iri;(s)  (3.5) 

Jot 

+  /  b{x'^{s),a)[r‘^’'{da,s)  —  r'{da,s)]ds 

Jo  Ju 

It  will  be  seen  that  the  sup  over  any  finite  time  interval  of  the  absolute  value  of 
the  last  term  of  (3.5)  goes  to  zero  in  mean  square,  by  virtue  of  (3.1).  For  small 


10 


A  >  0,  that  term  can  be  rewritten  as  (modulo  0(A)) 


„IX+X 


E 

1=0 


lix  Ju 

rdX  +  X 


b{x'^{l\),  a)  [r^'\da,  s)  —  r' {da,  s)]  ds 


(3.6) 


E 

1=0 


nx 


[b{x'^{s),  a)  —  b{x^{lX),a)]  [r'^''{da,  s)  —  r'{da,  s)]  ds. 


Here  [t/X]  denotes  the  integer  part  of  t/X.  As  A  — >  0  the  expectation  of  the 
square  of  the  last  term  of  (3.6)  goes  to  zero,  uniformly  on  any  finite  time  interval, 
and  in  r(-), r*^(-), a;(0), a;'^(0),  whether  or  not  (3.1)  holds,  since 

lim  sup  sup  E  sup  sup  \x'^{lX  +  s)  —  x‘^{lX)f  =  0.  (3.7) 

'*'^0  r  e  ix<t  s<X 


Assumption  (3.1)  can  be  used  to  show  that  the  same  uniform  limit  in  mean 
square  holds  for  the  first  term  of  (3.6)  for  any  A,  as  e  ^  0.  The  proof  of  (3.3) 
is  a  consequence  of  these  facts  and  the  Lipschitz  condition.  The  convergence 
of  the  costs  is  a  consequence  of  the  convergence  of  the  paths,  controls,  and 
an  argument  concerning  the  convergence  of  the  stopping  times  such  as  used  in 
Theorem  2.1.  ■ 


Finite-valued  and  piecewise  constant  approximations  r^{')  in  (3.1). 

Now  some  approximations  of  subsequent  interest  will  be  defined.  They  are  just 
piecewise  constant  and  finite- valued  ordinary  admissible  controls.  Consider  the 
following  discretization  of  the  Ui.  Let  Ui  €  1R‘^' ,  Euclidean  d^-space.  Given 
/X  >  0,  partition  IRA  into  disjoint  (hyper)cubes  with  diameters  /x.  The 

boundaries  can  be  assigned  to  the  subsets  in  any  way.  Define  for 

the  finite  number  {p^)  of  non-empty  intersections.  Choose  a  point  of’  G  U^'  . 
Now,  given  admissible  (ri(-),  r2(-)),  define  the  approximating  admissible  relaxed 
control  rf  (•)  on  the  control  value  space  t/f  =  {a^'^ ,l  <  pf }  by  its  derivative  as 
r^''{a'l'^,t)  =  r{{U^'\t).  Denote  the  set  of  such  controls  by  Ui{p).  The  following 
theorem  is  a  consequence  of  Theorem  3.1.  A  version  can  also  be  found  in  [12]. 

Theorem  3.2.  Assume  (A2.1)-(A2.2),  and  the  above  approximation  ofri{-)  by 
rf(-)  G  Ui{p),i  =  1,2.  Then  (3.1),  (3.3),  and  (3.4)  hold  for  /x  replacing  e,  no 
matter  what  the  {[/f of ’*}.  The  same  result  holds  if  we  approximate  only  one 
of  the  ri{-). 


Finite-valued,  piecewise-constant  and  “delayed”  approximations.  The 

proofs  of  convergence  depend  on  showing  that  the  cost  changes  little  if  the 
control  actions  of  any  player  are  discretized  in  time  and  slightly  delayed.  Let 
rf(-)  G  Ui{p),  where  the  control  value  space  for  player  i  is  Uf.  Let  A  >  0. 
Define  the  “backward”  differences  =  r^{a'^'\kA)—rf{af'\kA—A),  I  <  pf , 
k  =  1, . . . .  Define  the  piecewise  constant  ordinary  controls  u^’‘^{-)  G  Ui{iT)  on 
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the  interval  [kA,  kA  +  A)  by 

,  kA  + 

Note  that  on  [kA,kA  +  A),  takes  the  value  of’*  on  a  time  interval 

of  length  A^’l.  Note  also  that  the  are  “delayed,”  in  that  the  values 

of  ri{-)  on  [kA  —  A,kA)  determine  the  values  of  on  [kA,kA  +  A). 

Thus  G  [kA,kA  +  A)  is  J^feA--ineasurable.  Let  denote  the 

relaxed  control  representation  of  with  time  derivative  Let 

Ui{fj,,S)  denote  the  subset  of  Ui{fi)  that  are  ordinary  controls  and  constant  on 
the  intervals  [W,  W  +  (5),  ^  =  0, 1, _ 

The  intervals  A^’^  in  (3.8)  are  just  real  numbers.  For  later  use,  it  is  im¬ 
portant  to  have  them  be  some  multiple  of  some  small  (5  >  0,  where  A/ 5  is  an 
integer.  Consider  one  method  of  doing  this.  Divide  [kA,kA  +  A)  into  A/ 6 
subintervals  of  length  <5  each.  Working  in  order  Z  =  1, 2 . . . ,  to  each  value  of’* 
first  assign  (the  integer  part)  [A^’l/S]  successive  subintervals  of  length  S.  The 
total  fraction  of  time  that  is  unassigned  on  any  bounded  time  interval  will  go 
to  zero  as  (5  ^  0,  and  how  control  values  are  assigned  to  them  will  have  little 
effect.  However,  for  specificity  for  future  use  consider  the  following  method. 
The  unassigned  length  for  value  is  Lff’*  =  A^'l  —  [A^'l/6]d,  i  <  ■  Define 

the  sum  Lik  ’\  which  must  be  an  integral  multiple  of  <5..  Then  assign 

each  unassigned  i5-interval  at  random  with  value  chosen  with  probability 
Theorem  2.1,  this  assignment  and  randomization  approximates 
the  original  relaxed  control. 

Let  Ui{p,  S,  A)  denote  the  set  of  such  controls.  If  is  obtained  from 

ri(-)  in  this  way,  then  it  is  a  function  of  ri(-),  but  this  functional  dependence 
will  be  omitted  in  the  notation.  Let  denote  the  time  derivative  of 

As  stated  in  the  next  theorem,  which  is  a  consequence  of  Theorem 
3.1,  for  fixed  p  and  small  <5,  well  approximates  the  effects  of  uf’'^(-) 

and  ri(-),  uniformly  in  ri(-)  and  {of’  }.  In  particular,  (3.1)  holds  in  the  sense 
that  for  each  /x  >  0,  A  >  0,  and  bounded  and  continuous  for  t  =  1,2, 

lim  sup  if  sup  f  [  rf’^’^’'{dai,s)  —  rf’‘^''{dai,s)  ds  =0.  (3.9) 

<5^0  nGUi  t<T  Jo  Jui  L  -I 

Theorem  3.3.  Assume  (A2.1)-(A2.2),  Let  ri{-)  G  Ui,i  =  1,2.  Given  (/x,(j,  A)  > 
0,  approximate  as  above  the  theorem  to  get  r^’^’^(-)  G  S,  A) .  Then  (3.1) 
holds  for  r^'  ’  (•)  and  (/x,  (5,  A)  replacing  rf(-)  and  e,  respectively.  Also,  (3.9) 
holds.  In  particular,  given  e  >  0,  there  are  pe  >  0,  Je  >  0,  A^  >  0  and  Ke  >  0, 
such  that  for  /x  <  /x^,  <5  <  (5^,  A  <  Ag  and  6/ A  <  k^, 

sup  sup  sup  Wi{x,ri,r2)  —  Wi{x,ri,U2’^’^)  <  e.  (3.10) 

X  ri  r2 


\t)  =  af  for  t  G  fcA  +  ^ 
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The  expression  (3.10)  holds  with  the  indices  1  and  2  interchanged  or  if  both 
controls  are  approximated. 

Consider  the  discrete-time  system  (2.5)  with  either  the  interpolation  that  is 
piecewise  constant  or  (2.6).  Then  the  jjL^  >  0,  (5,;  >  0,  >  0  and  Kg  >  0  can  he 

defined  so  that 


sup  sup  sup 

X  ri  r2 


W^{x,ri,r2)  -  lF^(a;,ri,u^’‘’’"^) 


<  e. 


(3.11) 


Define  the  delayed  controls  rf"(-)  by  r^’'{-,  s)  =  r'(-,  s  — A)  for  s  G  [nA,  nA  + A). 
The  ri(-)  in  Wf"{x,ri,U2’  ’  )  can  he  replaced  by  its  approximation  (•).  The 
expression  (3.11)  holds  with  the  indices  1  and  2  interchanged  or  if  both  controls 
are  approximated. 


Note  on  the  initial  values  of  the  controls.  Since  the  controls  are  delayed 
by  A,  we  can  assign  the  values  on  the  initial  interval  [0,  A]  in  any  way  at  all. 
Let  the  values  Ui{lS),l6  <  A,  be  in  Uf  and  fixed,  for  i=  1,2. 


4  Equilibria  and  Approximations 


Elliott-Kalton  strategies.  The  classical  definition  of  strategy  as  used  in  dif¬ 
ferential  games  for  models  such  as  (2.1)  or  (2.3)  is  that  of  Elliott  and  Kalton 
[4,  7].  A  strategy  ci(-)  for  player  1  is  a  mapping  from  U2  to  Ui  with  the  follow¬ 
ing  property.  If  admissible  controls  r2(-)  and  f2(-)  satisfy  r2(s)  =  f2(s),s  <  t 
for  s  <  t,  then  Ci(r2)(s)  =  Ci(f2)(s),s  <  t,  and  with  an  analogous  definition 
for  player  2  strategies.  Let  Ci  denote  the  set  of  such  strategies  or  mappings  for 
player  i.  An  Elliott-Kalton  strategy  is  a  generalization  of  a  feedback  control. 
The  current  control  action  that  it  yields  for  any  player  is  a  function  only  of  the 
past  control  actions,  and  does  not  otherwise  depend  on  the  form  of  the  strategy 
of  the  other  player. 

A  pair  Cj(-)  G  Ci,i  =  1,2,  is  said  to  be  an  e-equilibrium  strategy  pair  if  for 
any  admissible  controls  ri{-),i  =  1,2,"^ 

lTi(a;,ci,C2)  >  lEi(a;,ri,C2)  -  e, 

(4.1) 

W2{x,  Cl,  C2)  >  W2{x,  ci,r2)  -  e. 

The  notation  lTi(a;,  ci,  C2)  implies  that  each  player  i  uses  its  strategy  Ci{-). 
When  writing  Wi(a;,  Ci,  C2),  it  is  assumed  that  the  associated  process  is  well 
defined.  This  will  be  the  case  here,  since  Theorem  3.3  implies  that  it  is  sufficient 
to  use  strategies  whose  control  functions  are  piecewise  constant.  If  (4.1)  holds 
with  e  =  0,  then  we  have  an  equilibrium  strategy  pair.  The  controls  can  be 

■^The  definition  in  [4]  requires  that  the  controls  be  progressively  measurable,  and 

not  simply  measurable  and  adapted,  for  each  Borel  set  A.  But  due  to  the  approximation 
results  of  Theorems  3. 1-3.3,  this  added  requirement  is  unnecessary  in  our  case. 
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either  ordinary  or  relaxed.  The  notation  W2(x,  ci,  r2)  implies  that  player  1  uses 
its  strategy  ci(-)  and  player  2  uses  the  relaxed  control  r2(-). 

The  above  definition  of  strategy  does  not  properly  allow  for  randomized 
controls,  where  the  realized  responses  given  by  the  strategy  of  a  player  to  control 
process  of  the  other  player  that  are  identical  on  some  interval  might  differ  there, 
depending  on  the  random  choices  that  it  makes.  So  we  also  allow  randomized 
strategies  that  have  the  form  of  the  second  line  in  (4.2)  below  for  either  one 
or  both  of  the  players.  Theorem  2.1  shows  the  connection  between  relaxed 
and  randomized  controls,  so  that  one  can  work  with  relaxed  controls  in  lieu  of 
randomization,  if  desired. 

We  will  require  the  following  assumption. 


A4.1.  For  each  small  e  >  0  there  is  an  e-equilibrium  Elliott-Kalton  strategy 
(cf(-),c|(-))  under  which  the  solution  to  (2.1)  or  (2.3)  is  well  defined. 

The  following  approximation  theorem  will  be  a  key  item  in  the  development. 


Theorem  4.1.  Assume  (A2.1)  and  (A2.2).  Given  ei  >  0,  there  are  positive 
numbers  p,,S,A,  where  A/ 5  is  an  integer,  such  that  the  values  for  any  strategy 
pair  (ci(-),  C2(-)))  *  =  1)  2,  with  Ci(-)  G  Ci  and  under  which  the  solution  to  (2.3)  is 
well  defined^,  can  be  approximated  within  ei  by  strategy  pairs  c^’^’^{-),i  =  1,2, 
of  the  following  form.  The  realizations  of  cf’  ’  (•)  {which  depend  on  the  other 
player’s  strategy  or  control)  are  ordinary  controls  in  Ui{fj,,6,  A),  and  we  denote 
them  by  u^’^’^{-).  For  integer  n,  k,  and  kS  G  [nA,nA  +  A)  and  ai  taking  values 
in  Uf, 


P<u: 


IT, 5,  A 


{k6)  = 


^(s))  s  <  k6;  Uj’^’‘^{l6),j  =  1,  2,  /  < 


=  P|<’‘^’'^(M)  =  a*  w{lA),l<  n-,u^/'^{l5),j  =  1,2,15  <  nAj  (4-2) 
=  Pi,k  {oit',  w{lA),l  <  n-,Uj’^’^{l5),j  =  1,2,16  <  uAJ  , 


which  defines  the  functions  Pi^k{')-  For  each  positive  value  of  p,,  5,  A,  the  func¬ 
tions  pi^k{')  can  be  taken  to  be  continuous  in  the  w-arguments,  for  each  value  of 
the  other  arguments. 

Suppose  that  the  control  process  realizations  for  player  i  are  in  Ui{p,5,  A), 
but  those  of  the  other  player  are  general  relaxed  controls.  Then  we  interpret 
(4.2),  applied  to  that  control,  as  being  based  on  its  discretized  approximation  as 
derived  above  Theorem  3.3. 


A  convenient  representation  of  the  valnes  in  (4.2).  It  will  be  useful  for 
the  convergence  proofs  if  the  random  selections  implied  by  the  conditional  prob- 
abilties  in  (4.2)  were  systematized  as  follows.  Let  {6*;}  be  random  variables  that 
are  mutually  independent  and  uniformly  distributed  on  [0, 1].  The  {0fe,A:  >  1} 

®One  or  both  of  them  might  be  simply  fixed  relaxed  feedback  controls. 
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will  be  independent  of  all  system  data  before  time  16.  For  each  i,  n,  k,  divide 
[0, 1]  into  (random)  subintervals  whose  lengths  are  proportional  to  the  condi¬ 
tional  probability  of  the  of’*  as  given  by  (4.2),  and  select  of’*  if  the  random 
selection  on  [0, 1]  falls  into  that  subinterval.  The  same  random  variables  {6*;} 
are  used  for  both  players,  and  for  all  conditional  probability  rules  of  the  form 
(4.2).  This  representation  is  used  for  theoretical  purposes  only. 


Proof.  By  Theorem  3.3,  it  is  sufficient  to  work  with  strategies  whose  control 
process  realizations  are  in  Ui{^,S,A).  In  any  case,  let  Ci(-)  G  Ci,i  =  1,2,  be 
any  strategies  for  which  the  solution  to  (2.3)  is  well  defined.  Then  Theorem 
3.3  implies  that  the  control  process  realizations  of  the  strategies  can  be  approx¬ 
imated  by  those  of  a  pair  of  strategies  cf’‘^’^(-),z  =  1,2,  with  control  process 
realizations  in  =  1,2.  To  get  the  player  i  would  start  by 

calculating  the  response  of  Cj(-)  to  the  original  strategy  of  the  other  player,  and 
then  approximate  and  possibly  delay  it  as  done  above  Theorem  3.3.  [I.e.,  each  of 
the  original  control  sequences  is  replaced  by  the  discretization  discussed  above 
Theorem  3.3.]  This  approximation  is  uniform  in  the  original  strategies  in  that 
the  differences  in  the  cost  functions  can  be  made  small,  uniformly  in  the  orig¬ 
inal  strategies,  for  small  enough  fj.,6,A.  Hence  Theorem  3.3  yields  the  claim. 
The  cf’'^’'^(-)  are  Elliott-Kalton  strategies,  since  they  are  simply  time  and  space 
discretizations  of  Elliott-Kalton  strategies. 

The  probability  law  of  determines  the  law  of  the 

corresponding  solution  to  (2.1).  The  law  of  evolution  of  the  controls  can  be 
written  in  recursive  form,  for  i  =  1,2,  and  k6  G  [nA,  nA  -I-  A), 


P 


|^y^.'5A(^j)  _  w{s),  s  <  nA;  Uj’^’‘^{l6),j  =  1,  2, 16  <  nAj’  .  (4.3) 


This  yields  a  “randomized”  Elliott-Kalton  strategy  pair. 

Now  apply  the  control  rule  (4.3)  to  the  piecewise  constant  interpolation  of 
the  discrete-time  system  (2.5).  The  probability  law  of  the  solution  on  [0,t]  is 
determined  by  the  law  of  (16),  (16),  16  <  t;  w{nA),nA  <  .  Hence, 

for  k6  G  [nA,nA  -|-  A),  the  probability  law  of  the  controls  and  paths  for 
can  be  determined  from  the  formula 


P 


|^yM.<5,A(^j)  _  w{lA),l  <  n;Uj’^’^{l6),j  =  1,2,16  <  nAj’ . 


(4.4) 


By  Theorem  3.3,  for  small  enough  6,  A  the  path  x^(-)  is  arbitrarily  close  (uni¬ 
formly  in  the  original  strategies  or  controls  Ci{-),i  =  1,  2)  to  the  path  a:(-),  both 
under  =  1,2,  where  we  can  suppose  (without  loss  of  generality)  that 

the  law  of  evolution  of  the  controls  takes  the  form  (4.4).  By  the  same  theorem 
and  the  construction  of  the  =  1,  2,  for  small  enough  fi,  6,  A  this  latter 

path  is  arbitrarily  close  (uniformly  in  the  original  Ci(-),  i  =  1,  2)  to  the  path  x{-) 
under  the  original  Ci(-),z  =  1,2.  This  argument  implies  the  use  of  the  samples 
w{lA)  in  (4.2). 
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Now  turn  to  the  assertion  concerning  continuity  in  the  rc-arguments.  (See 
also  [15,  Theorem  10.3.1]  on  this  point.)  For  p  >  0,  consider  the  smoothed 
conditional  probability  defined  by 

Pik  w{lA)J  <  =  1,2,15  <  uA) 

^  r  (4.5) 

=  N{p)  J  (af,z;u'^’^’^{lS),j  =  1,2,15  <  nA^  dz 

where  N{p)  is  a  normalizing  constant  and  w  =  {w{lA),l  <  n}.  The  variable 
2;  has  the  same  dimension  as  w.  The  integral  is  continuous  in  the  ru-variables, 
uniformly  in  the  others.  Also  it  converges  to 

Pi,k  (a*;  w{lA),l  <  n-,Uj’^’^{l5),j  =  1,2,15  <  uAJ 

for  almost  all  re- values.  Hence,  by  Egoroff’s  Theorem,  it  converges  almost  uni¬ 
formly  in  any  compact  set.  For  almost  all  w-values  the  smoothed  conditional 
probability  will  choose  the  same  control  values  as  would  the  original  rule  defined 
by  (4.3)  with  a  probability  that  goes  to  unity  as  p  — >  0.  Hence,  without  loss 
of  generality  we  can  suppose  that  the  Pi,k{')  are  smooth  in  the  ic-variables,  as 
asserted.  ■ 


5  The  Markov  Chain  Approximation:  Brief  Re¬ 
view  and  Approximations 

5.1  The  Markov  Chain  Approximation  Method 

We  will  start  by  giving  a  quick  overview  of  the  Markov  chain  approximation 
method  of  [10,  11,  15],  starting  with  some  comments  for  the  case  where  there 
is  only  one  player.  We  will  then  develop  some  approximation  results  that  are 
analogous  to  those  in  Theorem  3.3,  and  which  will  be  crucial  for  the  conver¬ 
gence  theorems  in  Section  6.  The  method  consists  of  two  steps.  Let  ft,  >  0  be  an 
approximation  parameter.  The  first  step  is  the  determination  of  a  finite-state 
controlled  Markov  chain  that  has  a  continuous-time  interpolation  that  is  an 
“approximation”  of  the  process  x(-).  The  second  step  solves  the  optimization 
problem  for  the  chain  and  a  cost  function  that  approximates  the  one  used  for 
x(-).  Under  a  natural  “local  consistency”  condition,  the  minimal  cost  function 
for  the  chain  converges  to  the  minimal  cost  function  for  the  original  problem. 
In  applications,  the  optimal  control  for  the  original  problem  is  also  approxi¬ 
mated.  The  approximating  chain  and  local  consistency  conditions  are  the  same 
for  the  game  problem.  The  reference  [15]  contains  a  comprehensive  discussion 
of  many  automatic  and  simple  methods  for  getting  the  transition  probabilities 
of  the  chain.  The  approximations  “stay  close”  to  the  physical  model  and  can 
be  adjusted  to  exploit  local  features. 

The  simplest  state  space  for  the  chain  for  our  model  (and  the  one  that  we 
will  use  for  simplicity  in  the  discussion)  is  based  on  the  regular  ft-grid  Sh  in  M" . 
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Define  Gh  =  ShC^G  and  =  S'/I  n  G°.  It  is  only  the  points  in  G°  U  dGh  that 
are  of  interest.  On  G°  the  chain  “approximates”  the  diffusion  part  of  (2.1)  or 
(2.3).  Let  dGh  denote  the  points  in  Sh  —  G°  that  can  be  reached  in  one  step 
from  G°  under  some  control.  These  are  the  boundary  points,  and  the  process 
stops  on  first  reaching  them. 

Next  we  define  the  basic  condition  of  local  consistency  for  the  part  of  a  chain 
that  is  on  G^.  Let  uj)  =  (uynjU^.n)  denote  the  controls  that  are  used  at  step 
n.  Define  —  and  let  denote  the  expectation  given  the  data 

to  step  n  (when  has  just  been  computed)  with  =  x  and  control  value 
a  =  to  be  used  on  the  next  step.  For  the  game  problem,  a  =  (ai,a2)  with 
ai  G  Ui-  Define  a{x)  =  a{x)a'{x).  Suppose  that  there  is  a  function  At^{-)  (this 
is  obtained  automatically  when  the  transition  probabilities  are  calculated;  see 
[15]  and  the  example  below)  such  that  (this  defines  the  functions  b^{-)  and  a^(-)) 

=  b’^{x,  a)At^{x,  a)  =  b{x,  a)At'^{x,  a)  +  o{At^{x,  a)), 
cov^:“[^C«  -  =  a!^{x,a)At'^{x,a)  =  a(x)At''(x,  a)  +  o(At'*(x,  a)), 

lim  sup  At^(x,  q;)  =  0. 

(5.1) 

It  can  be  seen  that  the  chain  has  the  “local  properties”  (conditional  mean  change 
and  conditional  covariance)  of  the  diffusion  process.®  One  can  always  select  the 
transition  probabilities  such  that  the  intervals  At^{x,a)  do  not  depend  on  the 
control  variable,  although  the  general  theory  in  [15]  does  not  require  it.  Such  a 
simplification  is  often  done  in  applications  to  simplify  the  coding.  Let  p^{x,  y\a) 
denote  the  probability  that  the  next  state  is  y  given  that  the  current  state  is  x 
and  control  pair  a  =  (oi,  02)  is  used. 

Under  our  condition  that  the  controls  are  separated  in  5(-),  in  that  b(x,  a)  = 
bi{x,  ai)  +  62 (x,  02),  if  desired  one  can  construct  the  chain  so  that  the  controls 
are  “separated”  in  that  the  one-step  transition  probability  has  the  form 

p'*(x,  y\a)  =  Pi  (x,  p|ai)  -f  P2 (x,  y\a2).  (5.2) 


A  useful  representation  of  the  transition  probabilities.  It  is  useful  to 
have  the  chains  for  each  h  defined  on  the  same  probability  space,  no  matter 
what  the  controls.  This  is  done  as  follows.  Let  {xn}  be  a  sequence  of  mutually 
independent  random  variables,  uniformly  distributed  on  the  interval  [0, 1]  and 
such  that  {xi,l  A  n}  is  independent  of  <  n}.  For  each  value  of  x  = 

=  u^,  arrange  the  finite  number  of  possible  next  possible  states  y  in  some 
order  and  divide  the  interval  [0, 1]  into  successive  subintervals  whose  lengths  are 
p^(x,  p|q;).  Then  for  x  =  =  u^,  select  the  next  state  according  to  where  the 

(uniformly  distributed)  random  choice  for  falls.  The  same  random  variables 
{Xn}  will  be  used  in  all  cases,  for  all  controls  and  values  of  h.  This  representation 
is  used  for  theoretical  purposes  only. 

^Whether  the  chain  is  Markovian  or  not  depends  on  the  form  of  the  control  that  is  applied. 
But  the  transition  probability  will  always  be  locally  consistent. 
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An  example  of  an  approximating  chain.  The  simplest  case  for  illustrative 
purposes  is  one-dimensional  and  where  h  is  small  enough  so  that  h\b{a,x)\  < 
a^{x).  Then  we  can  use  the  transition  probabilities  and  interval,  for  x  G  [15, 
Chapter  5], 


p^{x,  X  ±  h\a) 


cr^(x)  ±  hb{x,  a) 
2a‘^{x) 


At^{x,  a) 


a‘^{x)  ’ 


(5.3) 


Admissible  controls.  Let  denote  the  minimal  cr-algebra  that  measures  the 
control  and  state  data  to  step  n,  and  let  denote  the  expectation  conditioned 
on  An  admissible  control  for  player  i  at  step  n  is  a  C/i- valued  random 

variable  that  is  Jf^-measurable.  Let  denote  the  set  of  the  admissible  control 
processes  for  player  i. 

A  relaxed  control  for  the  chain  can  be  defined  as  follows.  Let  r^ni')  be  a 
distribution  on  the  Borel  sets  of  Ui  such  that  r^^{A)  is  iF^-measurable  for  each 
Borel  set  A  G  Ui.  Then  the  ?"(*„(•)  are  said  to  be  relaxed  controls  for  player  i  at 
step  n.  As  for  the  model  (2.3),  an  ordinary  control  at  step  n  can  be  represented 
by  the  relaxed  control  at  step  n  defined  by  r((„(A)  =  for  each  Borel 

set  A  C  Ui-  Define  r([(-)  by  r(((Ai  x  A2)  =  ?'y„(^i)?’2.n(^2),  where  the  Ai  are 

Borel  sets  in  [/*.  The  associated  transition  probability  is  f  /(x, y|a)r([((ia). 

Ju 

If  r'l'^{A)  can  be  written  as  a  measurable  function  of  for  each  Borel  set  A, 
then  the  control  is  said  to  be  relaxed  feedback.  Under  any  feedback  (or  relaxed 
feedback  or  randomized  feedback)  control,  the  process  is  a  Markov  chain. 
More  general  controls,  under  which  there  is  more  “past”  dependence  and  the 
chain  is  not  Markovian,  will  be  used  as  well.  Let  denote  the  set  of  control 
strategies  for 


The  cost  function.  Discretize  the  costs  as  follows.  The  cost  functions  are 
the  analogs  of  (2.2)  or  (2.4).  The  cost  rate  for  player  i  is  ki{x,ai)At^{x). 
The  stopping  costs  are  and  denotes  the  first  time  that  the  set  G°  is 
exited.  Let  W^{x,  u^u^)  denote  the  expected  cost  for  player  i  under  the  control 
sequences  u’l  =  >  0},i  =  1,2.  The  numerical  problem  is  to  solve  the 

game  problem  for  the  approximating  chain. 

Continuous-time  interpolations.  The  discrete-time  chain  is  used  for 
the  numerical  computations.  However,  for  the  proofs  of  convergence,  we  use 
a  continuous-time  interpolation  ^^(•)  of  that  will  approximate  x{-).  This 
will  be  a  continuous-time  process  that  is  constructed  as  follows.  Define  = 
and  =  YyZo  Define  on  Define  the 

continuous-time  interpolations  «(*(•)  of  the  control  actions  for  player  i  by  u^{t)  = 
u^nAn  At  <  tn+1,  and  let  its  (continuous  time)  relaxed  control  representation 
be  denoted  by  rf(-).  Define  r^(-)  =  (r^  (•), (•)))  with  time  derivative  r^’'(-). 
We  use  G/*  for  the  set  of  continuous  time  interpolations  of  the  control  for  player 
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i  as  well.  Let  denote  the  first  exit  time  from  G°. 

An  alternative  interpolation.  In  [15]  an  interpolation  called  was  used 

as  well,  and  had  some  advantages  in  simplifying  the  proofs  there.  We  describe  it 
briefly  so  that  the  convergence  results  of  [15]  can  be  used  where  needed.  For  each 
h,  let  i'n,n  =  0, 1, ... ,  be  mutually  independent  and  exponentially  distributed 
random  variables  with  unit  mean,  and  that  are  independent  of  >  0}. 

Define  At^  =  and  =  X;r=o^  Ar/*.  Define  and  =  < 

on  Now  decompose  '0^(')  in  terms  of  the  continuous-time  compen¬ 

sator  and  martingale.  Since  the  intervals  between  jumps  are  At^v^,  where  is 
exponentially  distributed  and  independent  of  the  jump  rate  of  when 
in  state  x  and  under  control  value  a  is  1/At^{x,a).  Given  a  jump,  the  distri¬ 
bution  of  the  next  state  is  given  by  the  p^(a;,7/ja),  and  the  conditional  mean 
change  is  b^{x,  a)At^{x,  a).  So  we  can  write 

=  x{0)  +  [  b^{il}^{s),v!;p{s))ds  + (5.4) 
where  the  martingale  M^{t)  has  quadratic  variation  process  /  u'^{s))ds. 

Jo 

Under  any  feedback  (or  randomized  feedback)  control,  the  process  is  a 

continuous-time  Markov  chain. 

It  can  be  shown  that  ([15,  Sections  5.7.3  and  10.4.1])  there  is  a  martingale 
w^{-)  (with  respect  to  the  filtration  generated  by  the  state  and  control  processes, 
possibly  augmented  by  an  “independent”  Wiener  process)  such  that 

=  f  a\i^\s),u'l{s))dw\s)  =  f  a{i^\s))dw\s)  +  (5.5) 

Jo  Jo 

where  cr^(-)[(T^(-)]'  =  a^(-)  (recall  the  definition  of  a^(-)  in  (5.1)),  w^(-)  has 
quadratic  variation  It  and  converges  weakly  to  a  standard  Wiener  process.  The 
martingale  e^(-)  is  due  to  the  difference  between  a{x)  and  a^{x)  (recall  the 
o(Af^)  terms  in  (5.1))  and 

limsupsupF;le^(s)l2  =  0  (5.6) 

^  S<t 

for  each  t.  Thus 

=  x{0)  +  f  f  b^{ip^{s),a)r^’'{da,s)ds  +  f  a{'tp^{s))dw^{s)  +  €^{t). 

Jo  Ju  Jo 

(5.7) 

The  interpolations  ^^(O  and  are  asymptotically  equivalent,  as  seen  in 
the  following  theorem,  so  that  any  asymptotic  results  for  one  are  also  asymptotic 
results  for  the  other.  We  will  use  ^^(O- 

Theorem  5.1.  Assume  the  local  consistency  (5.1).  Then  the  time  scales  with 
intervals  At^  and  At))  are  asymptotically  equivalent. 
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Proof.  Let  f^{t)  =  min{n  :  >  t}.  Write  At^  —  AtJj;  =  a 

martingale  difference.  By  the  martingale  property  we  have 


E  sup 


E  sup 


f^{t) 


i=0 


which  goes  to  zero  as  /i  — >  0  by  the  last  line  of  (5.1).  The  result  is  the  same  if 
we  define  J^{t)  =  min{n  :  >  t}.  ■ 

By  (5.1),  we  can  write 

where  /3()  is  a  martingale  difference  with  E^[f3l^][P^Y  =  There 

are  martingale  differences  6wY  with  conditional  (given  iF^)  covariance  At((/ 
such  that  [15,  Section  10.4.1],  [10,  Section  6.6]  /3(]  =  a^{^Yi^uY)Sw'Y.  Let  w^{-) 
denote  the  continuous  time  interpolation  of  with  intervals  Af(].  Then, 

abusing  notation,  we  can  write 


t  Jo  ^  Jo 

[  a^{e{s),u>^{s))dw>^{s)  =  [  a{e{s))dw^{s)  +  e’^{t), 
Jo  Jo 


(5.8) 


where  e^(-)  satisfies  (5.6)  and  is  due  to  the  O(At^)  approximation  of  a^{x,a) 
by  a(x)a(x)'. 


Note  on  convergence.  For  any  subsequence  ft.  ^  0,  there  is  a  further  subse¬ 
quence  (also  indexed  by  ft  for  simplicity)  such  that  (C^(-)) ’’i  (’)) ''’2  (’): 
converges  weakly  to  random  processes  (a:(-), ri(-), r2(-), rc(-), r),  where  ri(-)  is 
a  relaxed  control  for  player  i,  (a;(-),ri(-),r2(-)jW;(’))^{r'*<.})  is  nonanticipative 
with  respect  to  the  standard  vector-valued  Wiener  process  w{-),  and,  writing 
^(■)  =  (’'i(')) ^2(’)))  th®  set  satisfies 

b{x{s),a)r'{da,s)ds  +  /  a{x{s))dw{s). 

Jo 

Also,  WY{x,rY,r2)  Wi{x,ri,r2)-  The  proofs  of  these  facts  are  the  same  as 
for  the  one-player  control  case  in  [15,  Chapter  10]. 


On  the  construction  of  A  special  case.  Full  details  for  the  general 

method  of  constructing  w^{-)  are  in  [15,  Section  10.4.1],  [10,  Section  6.6].  To 
illustrate  the  idea  we  will  consider  a  very  common,  case,  and  one  that  will  be 
needed  in  Theorems  5.2,  5.3,  5.4,  5.6  and  6.2.  Suppose  that  a{-)  =  ct  is  a 
constant.  Suppose  that  the  components  of  x  can  be  partitioned  as  x  =  (x^,x^), 

^0-1  0  " 

0  0 


and  a  can  be  partitioned  as  cr  = 


where  the  dimension  of  x^  is  d^ 
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and  (7^  is  a  square  and  invertible  matrix  of  dimension  d^.  Partition  the  a^(-)  in 


the  second  line  of  (5.1)  as  a^{x,a)  = 


-,l,h 


(•) 


a) 

^2,l,h(^X^  a) 


a) 

a^’^{x,  a) 


.  As  — >  0, 


and  all  other  components  go  to  zero,  all  uniformly  in  (x,  a). 


Write  the  analogous  partition  w^{-)  =  For  any  Wiener  process 

w'^{-)  that  is  independent  of  the  other  random  variables,  we  can  let  = 

The  only  important  component  of  w^{-)  is  and  we  can  write 


=  cr 


li-l 


>n+l  Sn 


l,h 


't!} 


(5-9) 

where  is  due  to  the  approximation  of  by  cr^  and  its  interpolation 

satisfies  (5.6).  If  an  ordinary  control  is  used,  then  the  double  integral  is  iust 


5.2  First  approximations  to  the  Chain 

Approximation  results  analogous  to  those  of  Theorems  3. 1-3.3  can  be  proved 
and  will  be  useful.  These  approximations  have  an  independent  interest  and 
should  be  quite  useful  for  other  convergence  and  approximation  analyses  for 
numerical  approximations.  Theorem  5.2  concerns  an  approximation  to  (5.8) 
that  is  based  on  the  same  w^{-)  process,  and  will  be  used  in  Theorem  6.1. 
The  w^{-)  process  depends  on  the  control.  For  the  constant  cr-case.  Theorem 

5.3  shows  that  this  control  dependence  is  small  and  can  be  factored  out,  and 
(uniform  in  the  control)  approximations  in  terms  of  an  i.i.d.  driving  sequence 
are  developed.  Once  this  control  dependence  is  factored  out,  more  convenient 
approximations  to  the  chain  can  be  obtained,  and  this  is  done  in  Theorem  5.4. 

Consider  the  representation  (5.8),  and  for  /i,  (5,  A  as  used  in  Theorem  3.3  and 
the  r^(-)  =  (rf'(-)j  ^2  (■))  (5.8),  define  the  approximation  =  1,2, 

analogously  to  what  was  done  above  Theorem  3.3.  For  the  process  w^{-)  that 
appears  in  (5.8)  under  the  original  control  define  the  process 


Jo  Jo 


(5.10) 

Let  r^'  '  ’  (•)  denote  the  relaxed  control  representation  of  uf’  ’  ’  (•)•  The  pro¬ 
cess  defined  by  (5.10)  is  not  a  Markov  chain  even  if  the  controls  are  feedback, 
since  the  w^{-)  is  obtained  from  the  process  (5.8)  under  r^{-)  and  not  under 
the  rf’'^’^’^(.),i  =  1,2.  Let  denote  the  cost  for  the 

process  (5.10).  Define  the  discrete  time  system 


-k  A)  =  |''’'5^^’^(nA)  -k  J'  &(|^’^’^’'‘(nA),  M'^’^’^’'‘(s))ds 
+a{i>^’^’^’^{nA))[w^{nA  -k  A)  -  ri;'‘(nA)], 
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with  initial  condition  a;(0)  and  piecewise-constant  continuous-time  interpolation 
denoted  by  denote  the  associated 

cost.  We  have  the  following  analog  of  Theorem  3.3. 


Theorem  5.2.  Assume  (A2.1).  Given  {fi,d,A)  >  0,  approximate  rf(-)  as 
noted  above  to  get  Given  e  >  0,  there  are  /ie  >  0,  >  0,  >  0  and 

Ke  >  0,  such  that  for  any  t  <  oo,  p,  <  pe,  S  <  Se,  A  <  and  6/ A  <  k^, 

lim  sup  A  sup  =  0.  (5.12) 

(/x.5,A)^0  5<t  V  /  •.  V  /| 


If  (A2.2)  holds  in  addition,  then 


lim 

h^O 


sup 

x,r!^  ,r2 


<  e. 


(5.13) 


The  expressions  (5.12)  and  (5.13)  hold  if  only  one  of  the  controls  is  approx¬ 
imated,  and  also  if  are  replaced  by 

WA‘’'^’A’^(-),  resp. 


Comments  on  the  proof.  For  notational  simplicity  in  the  proof  drop  the  su¬ 
perscripts  p,  S.  Define  Sf^’^{t)  =  ^'^’^(t)  —  ^^{t).  Then,  following  the  procedure 
of  Theorem  3.1,  write 


Sf^''"{t)=[  [  [b{f‘^’^{s),a) -b’^{^’^{s),a)]r'^''{da,s)ds 
Jo^Ju 

+  [  [a{e’\s))-a{e{s))]dw\s) 

f  f  b{f‘^’^{s),a)  [r^’^’'{da,s)  —  r^’'{da,s)]  ds ei{t) 

Jo  Ju 

The  are  martingales  with  respect  to  the  filtration  induced  by  the 

data  (^^(•),  r^(-),  w^(-)),  w^{-)  has  quadratic  variation  It  and  e(*(-)  satisfies  (5.6). 
Partition  the  last  integral  analogously  to  what  was  done  in  (3.6),  with  intervals 
A.  The  process  ^'^’^(•)  satisfies  the  following  version  of  (3.7):  For  any  t  >  0  and 
small  K  >  0  there  is  h^  >  0  such  that  for  h  <  h^, 

lim  sup  sup  E  sup  sup  \f‘^''^{lX  +  s)  -  ^'^’^(U)  f  <  k. 

r'*  lX<t  s<\ 


Now,  using  the  martingale  property  and  the  Lipschitz  condition  one  proceeds 
in  the  same  way  that  would  be  used  for  approximations  to  (2.3)  in  Theorem 
3.1.  For  example,  for  some  constant  K,  we  have  the  inequality 


F;sup|(5^^’^(s)f  <K  f 
s<t  Jo 


+KE 


E 


1=0 


nx 


E\S^^'^{s)\^  ds  +  K^^'^it) 
b{f‘^’^{lX),a)  [r^’^’fda,  s) 


2 


r^’'{da,s)  ds 
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where  sup^<j  k'^’^(s)  ^  0  as  A  ^  0,  — >  0.  For  each  small  A,  the  last  term  in  the 

above  expression  goes  to  zero  uniformly  in  h  as  (/i,  <5,  A)  ^  0,  by  the  method  of 
approximation  of  the  controls.  Then  (5.12)  follows  from  the  resulting  inequality 
and  the  Bellman-Gronwall  Lemma.  The  inequality  (5.13)  follows  from  (5.12) 
and  (A2.2).  ■ 


5.3  Representations  and  Approximations  of  the  Chain  With 
Control-Independent  Driving  Noise 


The  driving  noise  w^{-)  depends  on  the  path  and  control.  In  Section  6  it  will 
be  useful  to  have  approximations  to  ^^(•)  (uniform  in  the  control  and  initial 
condition)  where  the  driving  noise  increments  are  independent  of  the  path  and 
control.  To  accomplish  this  we  will  need  to  factor  w^{-)  as  w^{-)  =  + 

(^{■)  where  w^{-)  does  not  depend  on  the  control  and  C^(’)  is  “asymptotically 
negligible.”  We  will  work  with  the  model  described  at  the  end  of  the  first 


subsection  of  this  section,  where  a  = 


0  0 


,  the  dimension  of  is  d} ,  and 


cr^  is  a  square  and  invertible  matrix  of  dimension  d} .  The  approximation  and 
representation  results  of  Theorems  5.2,  5.3  and  5.5  below  will  hold  for  such  a 
form.  But  to  simplify  the  notation  and  development,  we  will  work  with  two 
specific  forms,  each  of  which  is  typical  of  a  large  class  of  models  and  numerical 
algorithms.  Case  1  below  arises  when  one  uses  the  so-called  central  difference 
approximation.  Case  2  arises  when  one  uses  a  central  difference  approximation 
for  the  non-degenerate  part  and  a  one-sided  or  “upwind”  approximation  for  the 
degenerate  part  [15,  Chapter  5].  Both  forms  are  locally  consistent. 


Case  1.  Suppose  that  dd  =  v,  so  that  a  is  invertable.  For  a  =  aa' ,  suppose  that 
Qi.i  —  A  0.  This  condition  can  be  weakened  if  the  approximation 

intervals  can  depend  on  the  coordinate  direction,  or  if  linear  transformations 
of  the  state  space  do  not  pose  programming  difficulties  [15,  Chapter  5].  In  the 
same  reference  it  is  seen  that  canonical  forms  of  the  transition  probabilities  and 
interpolation  interval  have  the  form,  where  qij  =  qj^i, 


p^{x,  X  ±  eih\a)  =  ^  ^  At^  {x ,  a)  =  At'^  =  ^ 

q+. 

p^{x,  X  +  Cih  +  ejh\a)  =  p^{x,  x  —  Sih  —  ejh\a)  = 
p^{x,  X  +  Cih  —  ejh\a)  =  p^{x,  x  —  Cih  +  ejh\a)  = 


(5.14) 


Q  —  2  'y  [  qi^i  2  'y  [ 


Q 


The  qij  are  defined  in  terms  of  the  entries  of  the  matrix  aa'  and  are  given 
in  [15,  Equation  (3.15),  Chapter  5].  We  suppose  that  h  is  small  enough  so 
that  all  qi^i  —  h\bi{x,a)\  >  0.  A  simple  computation  using  (5.14)  shows  that 
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b^(x,a)  =  b{x,a)  and  a^{x,a)  =  aa'  +  0{At^).  Also,  by  (5.14)  we  can  write 
=  At^.  In  one  dimension,  (5.14)  reduces  to  (5.3),  where  qi^i  =  ct^/2  is 
determined  by  the  local  consistency  condition  (5.1). 


Case  2.  Suppose  that  a  can  be  partitioned  as  in  the  last  paragraph  of  the 

"  0  ■ 


first  subsection  of  this  section:  I.e.,  a  = 


0  0 


where  the  dimension  of 


is  d},  and  is  a  square  and  invertible  matrix  of  dimension  d}.  The  problem 
concerns  the  effect  of  the  degenerate  part.  The  following  canonical  model  for 
such  cases  is  motivated  by  the  general  model  of  [15,  Chapter  5].  Define  b  = 
suPa;,Q  X]i=di+i  c«)l-  Define  =  At((  =  h'^/[Q  +  hb].  Use  the  form  (5.14) 
for  i  <  d^,  with  Q  replaced  hy  Q  +  hb.  For  i  =  d^  +  1, . . .  ,v,  use 


p^{x,  X  ±  eih\a) 


hbf{x,  a) 
Q  -\-b 


p^{x,  x\a) 


ELdi+i  Mx,a)\ 
Q  +  hb 


We  still  have  a^{x,  a)  =  aa'  +  0{At^)  and  b^{x,  a)  =  b{x,  a). 


Theorem  5.3.  Use  either  of  the  models  Case  1  or  Case  2  described  above. 
Then  we  can  write  5w^  =  5w^  +  where  the  components  are  martingale 
differences.  The  6w^  are  i.i.d.,  {6w^,l  >  n}  is  independent  of  I  < 

n},  and  the  components  have  values  0{h).  Also  E!fSw'^[Sw'f\'  =  h^/Q,  ^  and 
EXiSCY  =  0{hAC),  EXiSwfY  =  OihAC). 

Proof.  The  proof  is  a  simple  construction.  The  basic  approach  is  to  first  define 
6wY  as  though  b{-)  =  0.  The  result  will  define  SwY.  Then  df!f  is  defined  to 
make  up  the  difference.  The  fact  that  the  dominant  terms  in  the  transition 
probabilities  in  (5.14)  do  not  depend  on  h,  and  that  the  contributions  due  to 
the  drift  (hence  control  and  state)  are  proportional  to  h  makes  this  possible.  To 
avoid  excessive  notation  and  concentrate  on  the  essential  ideas.  We  start  with 
Case  1  in  one  dimension.  The  treatment  of  the  higher-dimensional  model  follows 
the  same  pattern  and  this  is  illustrated  via  a  two  dimensional  case.  Then  the 
minor  modifications  that  are  required  for  Case  2  are  discussed.  The  procedure 
in  the  general  case  should  be  apparent  from  the  three  examples. 

Case  1,  one  dimension.  Write  the  double  integral  term  in  (5.9)  as  6(^((,  uY)AC, 
since  b^{-)  =  b{-).  To  construct  the  state  transitions,  we  will  use  the  represen¬ 
tation  in  terms  of  the  random  variables  x„  described  in  the  paragraph  below 
(5.2).  In  one  dimension  (5.14)  is  (5.3)  and  p^ {x ,  x  ±  h\a)  =  0.5±hb{x,a)/[2a‘^], 
At^  =  h?  Now,  define  ~  in  by  setting  it  equal  to  h  if  the  random 
sample  of  Xn  falls  in  [0,  .5  -I-  hb{^Y^uY)/2a‘^],  and  set  it  equal  to  —h  otherwise. 
The  “conditional  mean”  change  is  2/i[ft,5(^([, u(()/2(t^]  =  b{^Y,^u!f)AC ,  which  is 
just  what  is  required  by  the  local  consistency  condition  (5.1). 

AC  =  IQ  for  Case  1, 
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Define  the  martingale  difference  term  5w^  as  follows.  Divide  [0, 1]  into  the 
two  segments  [0,  .5],  (.5, 1],  If  the  random  sample  of  Xn  falls  in  [0,  .5],  set  5w^  = 
h/a,  otherwise  set  it  equal  to  —hja.  It  is  what  would  be  if  b{-)  =  0.  Now 
define  to  make  up  for  the  difference.  There  are  two  components  to  One 
component  is  due  to  the  error  a^(x,  a)  —  =  O(h^).  Hence  [a^{x,  —  a  = 

0{h?)  and  the  corresponding  error  in  computing  the  sample  values  of  5w^  is 
0{h^).  The  associated  interpolated  error  process  clearly  satisfies  (5.6). 

The  second  component  of  is  due  to  the  neglect  of  the  b(-)  in  constructing 
Swl^.  We  handle  this  as  follows.  Suppose  that  >  0  (the  computation 

is  analogous  if  <  0).  Then 

=  {2h  -  b{it  if  x„  e  [.5,  .5  +  kbit.  <)/(2a2])] 

and  it  equals  —b{^ll,u^)At^ /a  otherwise.  The  conditional  variance  of  is 

[2h  -  b{C  «")  AtV^] ' 

(l  -  =  0{h)At\ 

uniformly  in  the  controls.  The  term  depends  on  the  control,  but  the  Sw^ 
term  does  not.  It  is  simply  a  Bernoulli  sequence,  with  {Swl^,  I  >  n}  independent 
of  the  data  up  to  step  n.  Also,  E!^[Sw!^]'^  —  At^,  E^Sw'^d^l^  =  0{h)At^  and 
—  0{h)At^,  uniformly  in  the  controls. 

Now,  construct  the  continuous-time  martingales®  by  interpo¬ 
lating  the  sums  J2i=o  J2i=o  ^Ci  with  intervals  At^.  Write  w'^{t)  = 

w^{t)  +  The  w^{-)  does  not  depend  on  the  control,  has  quadratic  vari¬ 

ation  It,  and  w^{s),s  >  t,  is  independent  of  ^^(s),  ■u^(s),  s  <  t.  The  quadratic 
variation  of  C^(')  (and  its  quadratic  covariation  with  w^{-))  is  0{h),  uniformly 
in  the  controls  and  initial  condition. 

Comment  on  the  two-dimensional  problem.  The  following  computation 
illustrates  the  procedure  in  higher  dimensions.  Let  qi^2  A  0  for  specificity. 
Divide  the  unit  interval  into  successive  subintervals  of  lengths  9i,i/Q,  (Zip/Q, 
Q2,2lQ,  <Z2.2/Q,  91,2/Q,  9i,2/Q-  Again,  the  aim  is  to  reproduce  the  transition 
probabilities  (5.14).  If  x„  falls  in  [0,  {qi,i  +  hbi{^^,u'^))/Q],  set  Ci, „+!-■?!,„  =  h, 
and  -  ^2,n  =  0-  If  Xn  falls  in  [(gi,i  -h  0)/(5,  2(7i,i/(5],  then  set 

S.Xn+1  -  ?i.n  =  -h,  and  =  0.  Do  the  analog  for  the  second 

component,  using  the  two  intervals  of  length  q2,2lQ-  If  Xn  falls  in  the  next  to 
last  of  the  four  subintervals,  then  set  ^^+1  ~  Cn  =  h),  and  equal  to  {—h,  —h) 

if  Xn  falls  in  the  last  of  the  four  subintervals.  Define  5w^  by  repeating  the  above 
with  b{x,  a)  =  0  and  premultiplying  by  cr“^.  The  procedure  is  analogous  in  any 
dimension. 

Comment  on  Case  2.  For  ease  of  presentation,  let  us  work  in  two  dimensions, 
where  only  the  first  component  of  a;(-)  has  a  Wiener  process  driving  term.  Then 

® Actually,  martingales  when  evaluated  at  the  tjj,  but  the  difference  is  unimportant. 
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b  =  maxa;_Q  |52(x,q;)|  and  Q  =  2qi^i  =  2[a^]‘^.  Slightly  modifying  the  procedure 
used  for  Case  1,  divide  the  unit  interval  into  successive  subintervals  of  lengths 

gi,i  gi,i  hi 

Q  +  hi’  Q  +  hi’  Q  +  hi 

and  divide  the  last  subinterval  into  two  further  subintervals  of  lengths 

h\b2{x,  a)\/[Q  +  hi],  h\l  -  \b2{x,  a)\]/[Q  +  hi]. 

Analogously  to  what  was  done  in  the  one-dimensional  example  of  Case  1,  if  the 
random  sample  of  x„  falls  in  [0,  Q/{Q  +  hh)]  then  set  i2,n+i  ~  ?2,n  =  0-  If  it  falls 
in  the  complementary  interval  {Q/{Q  +  hi),  1],  then  set  —  Ci,n  =  0-  If  Ih® 

random  sample  of  Xn  falls  in  the  last  subinterval,  then  set  ^2,n+i~^2,n  =  0-  If  if 
falls  into  the  next  to  last  subinterval,  then  set  ^2,n+i  =  ^sign(fe2(^^0). 

If  it  falls  into  the  first  (resp.,  second)  subinterval,  then  set  ^i^n+i  ~  Ci,n  =  h 
(resp.,  —h).  These  constructions  yield  (5.14)  with  =  h"^ /[Q  +  hi]. 

To  get  „,  repeat  the  procedure  with  b{-)  =1  =  0  and  divide  by  cr^.  In 
particular,  Swi  ^^  =  h/a^  if  Xn  falls  in  \fO,qi^i/Q].  It  is  —hju^  if  x„  falls  in 
(91,1/Q)  !]•  Tlie  variance  is  h?  jQ.  The  value  of  the  second  component  is 

unimportant  since  it  is  eventually  multiplied  by  zero.  So,  let  us  use  an  inde¬ 
pendent  Bernouilli  sequence  with  values  ±h/y/Qi  each  taken  with  probability 
1/2. 

The  terms  for  this  and  the  previous  example  compensate  for  the  errors 
and  is  computed  using  a  procedure  that  is  analogous  to  that  in  the  first  Case  1. 

■ 

The  theorem  implies  that  ^^(•)  can  be  written  in  the  form 

=  x{0)+ f  f  b{^’^{s),a)r^’'{da,s)ds+ f  a{^^{s))dw^{s)+e2{t),  (5.15) 
Jo  Ju  Jo 

where  e2(')  equals  ei{-)  plus  a  stochastic  integral  with  respect  to  C^(’)> 
satisfies  (5.6).  Since  the  martingale  'w^{-)  does  not  depend  on  the  control  and  is 
essentially  the  sum  of  i.i.d.  zero  mean  random  variables  of  size  0{h),  the  form 
(5.15)  can  be  used  to  obtain  approximation  theorems  of  the  type  in  Theorems 
3. 1-3. 3.  The  controls  can  be  space  and  time  discretized  with  arbitrarily  small 
change  in  the  costs,  just  as  in  the  cited  theorems.  For  Case  1,  the  quadratic 
variation  process  of  ty^(.)  is  It.  For  Case  2,  it  is  It[l  +  hh/Q], 

Theorem  5.4.  Assume  (A2.1)  and  the  models  of  Theorem  5.3.  Define 

f'^{t)=x{0)+  f  j  b{^^{s),a)r^''{da,s)ds+  f  a{f^{s))dw^{s).  (5.16) 

Jo  Ju  Jo 

Then,  for  each  t  >  0, 

lim  sup  Asup  ^'*(5) -|'*(s)  ^  =  0.  (5.17) 

x(0),r>'  s<t 
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If  (A2.2)  is  assumed  as  well,  then  the  costs  for  the  two  processes  are  arbitrarily 
close,  uniformly  in  the  control  and  initial  condition. 

Now,  given  {pL,5,N)  >  0,  let  be  the  delayed  and  discretized  ap¬ 

proximation  ofr^{-)  that  would  be  defined  by  the  procedure  above  Theorem  3.3, 
with  relaxed  control  representation  of  the  pair  {i  =  1,2)  of  approximations  being 
Define  the  system 


Jo  Ju 

+  [  a{f^^'^’^’'^{s))dw’^{s). 


(5.18) 


Jo 

Then  for  t  >  0  and  7  >  0  there  are  positive  numbers  fj,.y ,  S.y ,  A.y ,  h.y ,  K.y ,  such 
that  for  /i  <  (5  <  (5.y,  A  <  A.^,  h  <  h-^,  6/ A  <  Kj  we  have 


sup  E sup 

r'‘,a:(0)  s<t 


2 

<  7- 


(5.19) 


// (A2.2)  is  assumed  as  well,  then  for  small  {p,,S,A,h)  the  costs  are  arbitrarily 
close,  uniformly  in  the  control  and  initial  condition. 


Comment  on  the  proof.  The  proof  of  the  various  assertions  follows  the  lines 
of  arguments  used  in  Theorem  5.1,  exploiting  the  martingale  properties  and  the 
Lipschitz  condition.  The  details  are  very  similar  and  are  omitted. 

The  terms  ['w^{nA-\-A)—w^{nA)],n  =  0, 1, . . .  are  i.i.d.  and  have  orthogonal 
components.  For  Case  1,  the  covariance  A  is  times  the  identity  matrix  and  for 
Case  2,  it  is  A/[l  +  hb/Q],  and  the  processes  converge  to  normally  distributed 
random  variables  as  h  ^  0.  It  will  be  useful  to  quantify  this  closeness  for  use  in 
the  next  section.  This  will  be  done  in  Theorem  5.6,  which  requires  the  following 
strong  approximation  theorem  for  i.i.d.  random  variables. 

Lemma  5.5.  [2,  Theorem  3.]Let  {(pn}  be  a  sequence  of  -valued  i.i.d.  random 
variables  with  zero  mean  and  bounded  (2  +  S)th  moment,  where  0  <  <5  <  1. 
Suppose  that  the  covariance  matrix  T  is  non-singular.  Then  without  changing 
the  distribution,  one  can  redefine  the  sequence  on  a  richer  probability  space 
together  with  a  Wiener  process  B{-)  with  covariance  matrix  F  such  that 


B{n) 

i<.n 


o(n°'5-°) 


(5.20) 


w.p.l  for  large  n,  for  some  0  <  c  <  0.5. 

The  following  theorem  asserts  that  the  interpolated  chain  can  be  written 
essentially  as  the  discrete  time  system  (2.5),  which  we  now  write  as 
when  the  discretized  controls  are  used. 
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Theorem  5.6.  Assume  (A2.1)  and  (A2.2)  and  the  models  used  in  Theorem  5.3. 
Then  we  can  define  the  probability  space  such  that  w^ff)  =  wff)  +  p^{t),  where 
w{-)  is  a  vector-valued  Wiener  process  with  covariance  matrix  the  identity.  For 
each  t  >  0,  |p^(s)|  ^  0  and  |p^(s)p  ^  0  as  h  ^  0.  Let 

be  the  solution  to  (2.5)  with  the  same  Wiener  process  w{-)  and  with  the  controls 
that  are  used  in  (5.18).  Then,  for  any  t>  Q, 

lim  sup  Esup\x^^’^’‘^{s)-C’^’^’^{s)f  =  0.  (5.21) 

h^O  ^fi_2:(0)  s<t 


Proof.  Since  we  have  assumed  that  the  same  controls  are  used  for  both  systems 
(2.5)  and  (5.18),  some  explanation  is  needed.  Define  fn  by  =  f>n\/bA IQ- 
This  can  be  done  since  the  parameter  h  is  only  a  linear  scale  factor  in  the 
construction  of  the  duff.  Then  satisfies  the  conditions  of  Lemma  5.5  and 
we  can  suppose  that  the  probability  space  is  such  that  (5.20)  holds  for  some 
Wiener  process  Bf),  whose  covariance  matrix  will  be  the  identity.  Then,  on 
this  probability  space  define  6w!f  in  terms  of  the  fn,  as  above. 

Now,  starting  with  the  one  can  define  (possibly  by  enlarging  the  prob¬ 
ability  space)  the  chain  and  controls  uf  so  that  they  have  the  same  law  as 
originally.  This  can  be  done  by  using  a  procedure  that  is  similar  to  the  con¬ 
struction  in  Theorem  5.2.  One  starts  with  the  sets  in  the  probability  space  on 
which  the  udwf  take  their  particular  values.  Then  modify  them  by  sets  whose 
probabilities  are  h\bi{^f,uf)\/Q  analogously  to  what  was  done  in  Theorem  5.2. 
One  can  then  construct  the  chain  and  controls  recursively  so  that  the  law  of 
the  original  process  is  unchanged.  I.e.,  starting  with  a:(0),  get  Uq  which  is  a 
(possibly  random)  function  of  a;(0).  Then  compute  Swq  and  then  as  just 
described,  and  continue.  Given  the  controls  uf,  they  can  be  time  and  space 
discretized  and  delayed,^  as  in  Theorem  5.4. 

From  Lemma  5.5,  we  have,  w.p.l  for  large  n. 


hfi  —  hB{f) 

i=0 


(5.22) 


Consider  Case  1.  The  process  w{-)  =  hB{-/At^)/^/Q  is  a  Wiener  process 
whose  covariance  is  the  identity.  By  the  above  arguments  and  (5.22),  there  is  a 
constant  c  >  0  and  a  th  ^  0  as  h  ^  0  such  that  for 


t/AA 

^wl-w{f) 


z=0 


=  o{t[Ar]^) 


(5.23) 


w.p.l  for  t  >  th  and  small  h.  For  Case  2,  the  result  is  the  same  since  the 

^Actually,  it  is  only  required  that  the  controls  be  approximated  and  delayed  such  that  the 
control  applied  on  [nA,nA  + A)  is  -measurable.  The  other  aspects  of  the  discretization 

are  not  needed. 


28 


difference  between  the  w^{-)  processes  for  the  two  cases  is 


t{Q+hb)/h^ 


and  the  sup  of  this  over  any  finite  interval  goes  to  zero  in  mean  square  as  h  0. 

We  can  suppose  that  w{-)  is  the  discrete  time  process  that  yields  the  w{nA) 
in  (2.5).  We  can  also  suppose  that  the  controls  that  were  constructed  for  the 
chain  are  applied  to  (2.5).  By  the  above  arguments  concerning  the  approxima¬ 
tion  of  w^{-)  by  w{-),  we  can  write  w^{t)  =  w{t)  +  p^{t),  where  the  process 
p^(-)  has  independent  increments,  and  lim?i^o  £'Sup^<(  |p^(s)p  =  0.  From  this 
point  on  the  proof  is  standard,  using  the  Lipschitz  condition  and  the  martingale 
properties.  ■ 


6  An  Approximate  Equilibrium  for  the  Diffu¬ 
sion  Process  is  an  Approximate  Equilibrium 
for  the  Chain  and  Vice  Versa 


Representations  of  the  transition  probability  and  controls.  In  the  next 
two  theorems,  we  will  use  the  representations  of  the  transitions  of  the  Markov 
chain  in  terms  of  the  i.i.d.  random  variables  {xn}  discussed  in  the  paragraph 
after  (5.1),  and  the  similar  representation  for  the  realizations  of  the  rule  (4.2) 
in  terms  of  the  random  variables  {6*;}  noted  in  the  discussion  just  below  the 
statement  of  Theorem  4.1.  This  assures  that  the  sample  path  of  the  approxi¬ 
mating  chain  depends  only  on  the  selected  control  values,  and  that  the  selected 
control  value  in  (4.2)  depends  only  on  the  past  values  of  the  control  and  Wiener 
process. 

Theorem  6.1.  Assume  (A2.1),  (A2.2),  and  (A4.1).  An  e-equilibrium  value  for 
(2.1)  or  (2.3)  is  an  ei- equilibrium  value  for  the  approximating  Markov  chain, 
where  ei  ^  0  as  e  ^  Q. 

Proof.  Let  e  >  0  be  given.  By  (A4.1),  there  is  an  e-equilibrium  strategy 
pair  for  (2.3)  under  which  the  solution  to  (2.3)  is  well  defined.  By  Theorem  4.1, 
without  loss  of  generality,  and  for  small  enough  p,  5  and  A,  it  can  be  represented 
as  in  (4.2),  where  we  can  suppose  that  A/5  is  an  integer,  and  the  Pi,k{-)  are 
continuous  in  the  w  variables.  We  can  suppose,  w.l.o.g.,  that  for  each  n,  k  and 
i,  the  rule  (4.2)  is  defined  for  all  possible  conditioning  u-sequences  with  values 
in  Uf,i  =  1,2.  Let  cf(-),c^(-)  denote  this  strategy  pair.  The  strategies 
depend  on  p  and  6  as  well  as  on  A,  but  for  simplicity  we  suppress  that  in  the 
notation.  Recall  that  when  a  strategy  that  is  defined  by  a  rule  such  as  (4.2)  is 
applied  to  an  arbitrary  relaxed  control,  the  formula  (4.2)  is  actually  applied  to 
the  space-time  discretization  of  that  relaxed  control,  as  defined  above  Theorem 
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3.3.  These  strategies  cf‘(-)  will  need  to  be  adapted  for  use  on  the  chain.  To  do 
this,  simply  replace  the  r(;(-)-samples  in  (4.2)  by  samples  of  the  w^(-)  process 
that  was  defined  in  the  last  section.  Keep  in  mind  that  these  strategies  are  used 
for  theoretical  purposes  only,  to  prove  a  convergence  theorem.  They  are  not  for 
practical  implementation.  For  each  integer  k,  the  control  value  that 

is  obtained  from  the  rule  (4.2)  with  w^{-)  used  will  be  applied  to  the  chain  for 
all  steps  m  such  that  €  [k6,kS  +  (5).  The  resulting  strategies  for  the  chain 
will  be  denoted  by  and  are  in  Cf . 

We  want  to  show  that  for  small  enough  (^,  A,  (5),  there  are  Cq  >  0  and  ho  >  0, 
where  eg  — >  0  as  e  ^  0  such  that  for  h  <  ho  and  any  sequence  r(*(-)  of  admissible 
relaxed  (or  ordinary)  controls  for  the  chain. 


W^{x, 

W^{x, 


-A,h  -A,h\  \  Tjrti/  h 

Cl'  ,  C2  ’  )  >  bFi  (x,  ,  C2  ’  )  -  eo 


h  -A, /in 


—A,h  —A,h\  \  Tj/Zi /  — fi\ 

Cl'  ,C2  ’  )  >  W2  (x,Ci  ’  ,r2)  -  eo- 


(6.1) 


The  notation  bF^(x,  cf rj )  implies  that  player  1  uses  strategy  cf’^(-)  and 
player  2  uses  relaxed  control  (•)  (in  continuous  time  interpolation  notation),  or 
an  ordinary  control  with  this  relaxed  control  representation,  with  the  analogous 
interpretation  when  the  indices  are  reversed.  The  notation  Wi(x,c‘^'^,c^'^) 
implies  that  player  i  uses  strategy  =  1,2. 

Suppose  that  the  pair  =  1, 2,  is  used  for  the  chain.  Let  ff’‘^’^’^(-),  t  = 

1,2,  denote  the  (continuous  time  interpolation  notation)  relaxed  control  repre¬ 
sentation  of  the  control  actions.  Let  and  denote  the  correspond¬ 

ing  interpolation  of  the  chain,  the  “pre-Wiener”  process,  and  the  exit  time,  resp. 
The  sequence  is  tight.  Select  a  weakly 

convergent  subsequence  with  limit  denoted  by  (x(-),ri(-),  r2(-), ■ic(-), t),  where 
(x(-),ri(-),  r2{-),  I{t<  })  is  non-anticipative  with  respect  to  the  standard  vector¬ 
valued  Wiener  process  w{-),  and  the  set  (x(-),  ri(-),  r2(-), 'w(-))  solves  (2.3).  The 
limit  r  is  the  first  hitting  time  of  the  boundary  of  G  by  the  limit  process  x(-). 
The  details  concerning  the  tightness,  characterization  of  the  limit  processes  and 
boundary  hitting  times,  and  that  they  solve  (2.3),  are  the  same  as  for  the  control 
problem  in  [15,  Chapters  10,  11]. 

Henceforth,  when  weak  convergent  sequences  are  dealt  with,  when  needed 
for  simplicity  in  the  argument  we  will  suppose  (without  loss  of  generality)  that 
the  Skorohod  representation  is  used  so  that  all  processes  are  defined  on  the  same 
probability  space  and  the  weak  convergence  is  equivalent  to  convergence  with 
probability  one  in  the  appropriate  topology  [5,  Theorem  1.8,  Chapter  3]. 

Under  Skorohod  representation,  the  rule  (4.2)  with  the  rc^(-)-samples  used 
converges  w.p.l  to  the  same  rule  with  the  r(;(-)-samples  used,  due  to  the  con¬ 
vergence  w^{-)  w(-)  and  the  continuity  of  the  probabilities  in  (4.2)  in  the 

rc-variables.  Because  of  this,  the  limits  =  1,2,  are  just  realizations  of 

the  original  e-equilibrium  strategies  cf  (•),i  =  1,2.  Since  the  solution  to  (2.1)  or 
(2.3)  is  unique  for  each  admissible  pair  (control,  Wiener  process),  we  can  con¬ 
clude  that  the  probability  law  of  any  limit  set  (x(-),  ri(-),  r2(-))  'w(-))  is  the  same, 
no  matter  what  the  selected  convergent  subsequence.  Hence  the  original  set  of 
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processes  (before  the  subsequence  was  taken)  converges  weakly  to  this  (unique 
in  the  sense  of  probability  law)  limit  set,  where  the  control  is  determined  by  the 
rules  cf(-),i  =  1,  2. 

By  the  weak  convergence 


Wi(x,cf,c^) 

W2(x,c^,c^) 


W,^(x,ct’\ 

W2^(x,ct’\ 


W,\x, 


rh 

1  I  <-2 


. 


W^{x, 


-A,h 


(6.2) 

,r^)- 


(6.3) 

It  can  be  shown  by  a  weak  convergence  argument  working  with  the  chain  for 
any  fixed  h  >  0  that  the  maximizing  controls  rf  (•)  exist.  But  we  need  only  work 
with  control  process  that  approximate  the  maximum  values  arbitrarily  well  and 
we  suppose  that  the  ff(-)  are  such  controls. 

It  will  be  shown  that 


limsup;,^oW^f  <  Wi{x,cf ,  cf)  +  e  +  6,  A),  (6.4) 


A  -A^ 


where  p{p,6,A)  ^  0  as  {p,S,A)  0,  with  the  analogous  result  for  indices 

1,2  interchanged.  Inequalities  (6.2),  (6.3),  and  (6.4)  imply  that  if  player  2  uses 
then  player  1  cannot  do  better  (asymptotically,  as  ft.  — >  0  and  modulo 
p{p,6,A)  +  e)  than  by  using  cf’^(-),  with  the  analogous  result  holding  for  the 
other  player.  This  last  fact  implies  the  theorem  since  (/i,  S,  A)  can  be  made  as 
small  as  desired. 

Now  (6.4)  will  be  shown.  Let  denote  the  values  that  are  ob¬ 

tained  from  fi{-)  by  the  space  and  time  discretization  given  above  Theorem 
3.3,  and  which  are  used  by  the  rule  Let  denote  the  con¬ 

trol  choices  for  player  2,  based  on  the  rule  c^’^(-)  and  the  control  of  player. 
Let  ’  ’^(•)  denote  the  (continuous  time)  relaxed  control  representation  of 
The  processes  C^(-)  and  w^{-)  now  denote  the  interpolation  of 
the  chain  and  the  pre-Wiener  process,  resp.,  under  the  strategy  c^’^(-)  and  con¬ 
trol  (•).  This  w^{-)  process  will  be  fixed  for  each  ft  and  used  in  the  rest  of  the 
proof. 

Define  the  process  j^y  (5.12),  driven  by  the  {uf’'^’^’^(W)},  i  =  1,2, 

and  w^{-).  Note  that  {u2’^’'^’'^{l6)}  is  the  response  of  c^’^(-)  to  any  control  of 
player  1  with  discretization  By  Theorem  5.1,  we  have,  for  small 

ft. 


Wtia 


'-h  -A,h 

ri,C2’ 


)  _  pp'M.'SAAj' 


Tl  )  ^2  ) 


<  Pi  (ft,  <5,  A) 


(6.5) 


where  pi(ft,  6,  A)  can  be  made  arbitrarily  small,  uniformly  in  ft^(-),  as  (/i,  S,  A,  ft) 
0.  Also, 

Let  denote  the  first  hitting  time  of  the  boundary  for 
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The  set  is  tight.  Extract 

a  weakly  convergent  subsequence,  and  index  it  by  h  also.  Denote  the  limit  of  the 
weakly  convergent  subsequence  by  (a;(-),  fi(-),  w(-),t).  Then, 

as  was  the  case  in  an  earlier  part  of  the  proof,  (a;(-), fi(-), rf 
w{-),  I^T<  })  is  non-anticipative  with  respect  to  the  standard  Wiener  process 
w{-),  the  set  (x(-) ,  (■) ,  (■) ,  wl-))  satisfies  (2.3),  and  r  is  the  first  hitting 

time  of  the  boundary.  The  is  just  the  relaxed  control  that  is  defined 

by  the  weak  sense  limit  {u^'^’‘^{l6)}  of 

We  need  to  show  that  the  limits  u^’^’^{lS)  are  chosen  by  the  conditional 
probability  law  that  determines  c^(-)-  be.,  that  (along  the  selected  subsequence) 


P2,k 


(a2',  w^{lA),l  <  n;Uj’^’^’^{lS),j  =  1,2, IS  <  nA^ 

P2,k  (a2-,  w{lA),l  <  n;  Uj’^’^{l6),j  =  1,  2,  IS  <  nA^ 


(6.7) 


for  kS  €  [nA,  nA+A).  In  (6.7),  the  w^{-)  can  be  replaced  by  its  limit  «;(•)  due  to 
the  continuity  in  w.  Since  there  are  only  a  finite  number  of  values  for  the  control, 
for  any  t  <  oo  the  limit  {ui’^’^{lS),U2’^’'^{lS),lS  <  t}  will  be  achieved  after  a 
finite  number  of  steps  through  the  convergent  subsequence,  w.p.l.  This  implies 
(6.7).  [We  will  comment  further  on  this  point  at  the  end  of  the  proof.]  Thus 
the  policy  c^(-)  acting  on  any  relaxed  control  with  discretization  {ui’  ’  (W)} 
will  yield  the  sequence  {■U2’'^’'^(^i5)}.  Thus, 


M^i(a 


^M.5.A(,)^-A^ 


'2  ) 


and  by  (6.5)  and  (6.6),  mod  pi(^,  (5,  A), 


W(‘(a;,f^c^’'*)  ^  Wi(a;,<’‘^’'^(-),c^). 


We  can  conclude  that 

lim  W(*(x,f(‘,c^’'‘)  <  Wi(x,r(‘’'^’^(-),c^)  +Pi(m,'5,  A) 

'*^0  (6.8) 

<  Wi{x,cf,cf)+pi{n,S,A)  +  e, 

where  the  e  is  due  to  the  fact  that  (cf  (•),  c^(-))  is  an  e-equilibrium.  The  ar¬ 
bitrariness  of  the  subsequence  implies  (6.4).  The  same  argument  is  used  when 
the  indices  1,  2  are  reversed. 

Finally,  let  us  comment  on  (6.7).  Recall  that  the  discretizations  given  above 
Theorem  3.3  use  fixed  (and  asymptotically  unimportant)  values  on  the  initial 
interval  [0,  A),  so  let  us  use  u('’'^’^’^(/(5)  =  ui{lS),  U2'^'^'^{IS)  =  U2{IS)  for  fixed 
Ui{lS)  and  IS  <  A.  For  kS  G  [A,2A),  we  have  the  rule 

P2,k  {0:2;  w’^{A);ui{lS),U2{lS),lS  <  A)  (6.9) 

and  the  probability  of  selecting  any  a2  G  C/^  converges  as  w^{-)  — >  w{-). 
Then  the  limit  in  (6.9)  must  be  the  law  of  ^^’‘^’'^(/A),  A  <  IS  <  2A.  Us¬ 
ing  the  method  of  selecting  the  control  values  in  terms  of  the  9i  that  was 
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recalled  above  the  theorem  statement,  we  can  suppose  that  the  convergence 
U2’^’^’h{l6)  U2’^’^{IS),  A  <  IS  <  2 A,  occurs  in  a  finite  number  of  steps  w.p.l, 

as  h  ^  0  through  the  convergent  subsequence,  with  the  rule  (6.9)  used.  Next, 
on  [2 A,  2 A  +  A),  we  have  the  rule 

P2,k  (02;  w’^{lA),l  <  2;  Ui{l6),l6  <  A;  A  <l6<2A,i=  1, 2)  . 

The  Ui’^’^’^{lS),  A  <  IS  <  2 A,  can  be  assumed  to  converge  in  a  finite  num¬ 
ber  of  steps  as  well,  w.p.l.  Hence,  as  above,  so  do  the  selected  values  of 
A  <  IS  <  A  +  2A.  Continuing  in  this  way  yields  the  form  (6.7).  ■ 

The  converse  result. 

If  the  e-equilibrium  value  for  the  chain  is  unique  for  arbitrarily  small  e,  then 
the  converse  result  is  true;  namely  that  e-equilibrium  values  for  the  chain  are 
ei-equilibrium  values  for  (2.3),  where  ei  ^  0  as  e  ^  0,  and  we  are  done,  since 
Theorem  6.1  then  implies  that  the  e-equilibrium  values  for  the  diffusion  are  also 
unique  for  small  e,  and  that  the  numerical  solutions  will  converge  to  the  desired 
value.  If  the  e-equilibrium  value  for  the  chain  is  not  unique  for  arbitrarily  small 
e,  then  we  will  show  that  this  “converse”  assertion  is  true  for  the  model  used  in 
Theorem  5.2.  We  are  not  able  to  show  the  converse  result  when  cr(-)  depends 
on  X. 

Theorem  6.2.  Assume  (A2.1)  and  (A2.2)  and  the  model  used  in  Theorem  5.2. 
Then  for  any  e  >  0  there  is  ei  >  0  which  goes  to  zero  as  e  ^  0  such  that  an 
e-equilibrium  value  for  the  chain  for  small  h  is  an  ei- equilibrium  value  for 
(2.3). 


Proof.  Theorem  5.4  says  that  the  paths  and  cost  functions  for  (5.15)  (which 
is  ^^(•)  under  an  arbitrary  control),  (5.16)  (where  the  control  is  as  in  (5.15) 
but  the  driving  process  is  and  (5.18)  (which  is  (5.16)  with  discretized 

controls)  are  arbitrarily  close,  uniformly  in  the  controls,  for  small  {fr,S,A,h). 
Theorem  5.6  gives  the  same  result  for  (5.18)  and  which  is  (2.5)  with 

discretized  controls.  Theorem  3.3  implies  the  same  thing  for  x^’^’^{-)  and  (2.3). 
This  yields  the  result.  ■ 


References 

[1]  M.  Bardi,  M.  Falcone,  and  P.  Soravia.  Numerical  methods  for  pursuit- 
evasion  games  via  viscosity  solutions.  In  M.  Bardi,  T.E.S.  Raghavan,  and 
T.  Parthasarathy,  editors.  Stochastic  and  Differential  Games:  Theory  and 
Numerical  Methods.  Birkhauser,  Boston,  1998. 

[2]  I.  Berkes  and  W.  Philipp.  Approximation  theorems  for  independent  and 
weakly  dependent  random  vectors.  Ann.  Probab.,  7:29-54,  1979. 


33 


[3]  E. Altman,  O.Pourtallier,  A.  Haurie,  and  F.  Moresino.  Approximating 
Nash  equilibria  in  nonzero-sum  games.  International  Game  Theory  Re¬ 
view,  2:155-172,  2000. 

[4]  R.J.  Elliott  and  N.J.  Kalton.  Existence  of  value  in  differential  games,  Mem. 
AMS,  126.  Amer.  Math.  Soc,  Providence,  RI,  1974. 

[5]  S.N.  Ethier  and  T.G.  Kurtz.  Markov  Processes:  Characterization  and  Con¬ 
vergence.  Wiley,  New  York,  1986. 

[6]  W.F.  Fleming.  Generalized  solutions  in  optimal  stochastic  control.  In  P.T. 
Liu,  E.  Roxin,  and  R.  Sternberg,  editors.  Differential  Games  and  Control 
Theory:  III,  pages  147-165.  Marcel  Dekker,  1977. 

[7]  W.H.  Fleming  and  P.E.  Souganidis.  On  the  existence  of  value  functions  for 
two-player  zero-sum  differential  games.  Indiana  Univ.  Math.  J.,  38:293- 
314,  1989. 

[8]  A.B.  Haurie,  J.B.  Krawczyk,  and  M.  Roche.  Monitoring  cooperative  equi¬ 
libria  in  a  stochastic  differential  game.  J.  of  Optimiz.  Theory  and  Applic., 
81:73-95,  1994. 

[9]  A.B.  Haurie  and  F.  Moreseno.  Gomputing  equilibria  in  stochastic  games  of 
intergenerational  equity.  In  International  Conference  on  Dynamic  Games. 
International  Society  of  Dynamic  Games,  2004. 

[10]  H.J.  Kushner.  Probability  Methods  for  Approximations  in  Stochastic  Con¬ 
trol  and  for  Elliptic  Equations.  Academic  Press,  New  York,  1977. 

[11]  H.J.  Kushner.  Numerical  methods  for  stochastic  control  problems  in  con¬ 
tinuous  time.  SIAM  J.  Control  Optim.,  28:999-1048,  1990. 

[12]  H.J.  Kushner.  Numerical  methods  for  stochastic  differential  games.  SIAM 
J.  Control  Optim.,  pages  457-486,  2002. 

[13]  H.J.  Kushner.  Numerical  approximations  for  stochastic  differential  games: 
The  ergodic  case.  SIAM  J.  Control  Optim.,  42:1911-1933,  2004. 

[14]  H.J.  Kushner.  Numerical  methods  for  stochastic  differential  games:  The 
ergodic  cost  criterion.  In  S.  Jorgensen,  M.  Quincampoix,  and  T.  Vincent, 
editors,  Annals  of  the  International  Society  of  Dynamic  Games.  2005.  To 
appear. 

[15]  H.J.  Kushner  and  P.  Dupuis.  Numerical  Methods  for  Stochastic  Control 
Problems  in  Continuous  Time.  Springer- Verlag,  Berlin  and  New  York, 
1992.  Second  edition,  2001. 

[16]  D.W.  Stroock  and  S.R.S.  Varadhan.  On  degenerate  elliptic  and  parabolic 
operators  of  second  order  and  their  associated  diffusions.  Comm.  Pure 
Appl.  Math.,  25:651-713,  1972. 


34 


[17]  M.  Tidball.  Undiscounted  zero-sum  differential  games  with  stopping  times. 
In  G.J.  Olsder,  editor,  New  Trends  in  Dynamic  Games  and  Applications. 
Birkhauser,  Boston,  1995. 

[18]  M.  Tidball  and  R.L.V.  Gonzalez.  Zero-sum  differential  games  with  stop¬ 
ping  times:  Some  results  and  about  its  numerical  resolution.  In  T.  Basar 
and  A.  Haurie,  editors.  Advances  in  Dynamic  Games  and  Applications. 
Birkhauser,  Boston,  1994. 


35 


